We all now know that the Chatboxes and language based AI learning systems are profoundly interesting and seem to produce confounding output. Google has created LaMDA, a chatbot so good that one of Google’s research staff thought it had sentience. Open AI has ChatGPT, Anthropic has Claude. These all have a whole range of systems based on them for language and other types of generative output such as pictures and programming.
Meta (Facebook) have created a chatbot 'Cicero' for use within their multi-player game of ‘Diplomacy’. This game has up to 12 simultaneous players and these players are required to negotiate to find a wining solution to a world war. During testing, this chatbot not only won more games, but it fooled all the other hundreds of participants, (across many games), into thinking the chatbot was a real person. And it clearly could negotiate effectively to reach winning solutions.
We know that these learning systems are fed with ’lots’ of data and that once trained, nobody can work out how they derive their ‘answers’ to human or other inputs. They do however take time to train and they do require significant computing resources to operate.
We are told that these 'generative' AI models use neural networks that are layered. Each Layer obtains a better match to the input from the learnt 'memory' from the learning data sets. As the layers progress, the system gets closer to the best match. We are also told that systems like ChatGPT only produce one word at a time, this word being dependant on the previous words. The results then are pretty amazing given this seeming simplicity.
So, how does Meta's Cicero manage to interact meaningfully, in context with goal orientation, to create enough 'useful strategy' to fool other competent players? This makes little sense. If these AI systems are just modeling potential phrases one word at a time, and arriving at some sensible looking end result that fits, how on earth can Cicero manage to achieve the result recorded? No one seems to be explaining any of this.
New Research April 2023
Microsoft:
https://arxiv.org/pdf/2303.12712.pdf
" We demonstrate that, beyond its mastery of language, GPT-4 can solve novel and difficult tasks that span mathematics, coding, vision, medicine, law, psychology and more, without needing any special prompting. Moreover, in all of these tasks, GPT-4’s performance is strikingly close to human-level performance, and often vastly surpasses prior models such as ChatGPT. Given the breadth and depth of GPT-4’s capabilities, we believe that it could reasonably be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system"
"We demonstrate that, beyond its mastery of language, GPT-4 can solve novel and difficult tasks that span mathematics, coding, vision, medicine, law, psychology and more, without needing any special prompting"
So, CST wonders why this research is not leading to other industries taping into these abilities and attempting to show, physically, what the limitations of current AI systems can and cannot do.
What we don't know
CST does not understand why the scientific community is, in the main, so certain that AI models will not eventually take us all by surprise and create outputs that confound our current understanding of AI science.
Current neural networks, consist of thousands or millions of processing units connected to each other. Each node has connections of varying strengths to other nodes in the network. As the network analyses huge amounts of data, the strengths of those connections change as the network learns to perform the desired task.
Brain vs AI
In recent years researchers have turned to models built through a technique known as contrastive self-supervised learning. This type of learning allows an algorithm to learn to classify objects based on how similar they are to each other, with no external labels provided. However, the resulting models generated activity patterns very similar to those seen in the brains of animals that were performing the same tasks as the models. (see MIT research)
Such finding suggest that humans are already creating some neural networks that 'may' operate in a similar fashion to a Macmillan brain. Ok, these neural networks currently may be very poor, and in no way comparable to even the lowest mammalian brain.
The obvious 'unknown' here, is that we do not understand how the brain functions anyway. So to be certain that we 'know' what these neural networks are doing or their potential is clearly 'unknowable' at this time. Scientists normally work in silos and are happy to continue their research on 'currently accepted' principles. It is not in their interests or their brief to look beyond this to wider issues. CST takes a different view, we look at the wider experimentation, plus the ability of nature to do stuff that seems impossible with tiny resources.
Examples are 'scary' silicon where many years ago an EPROM was allowed to 'evolve' over 250 or so generations. This EPROM accurately re-created a square wave input as an output. But when the researchers reviewed the 'program' it had evolved, they found it was not normal digital program but had used analogue instead and part of the chip's workings seemed completely unconnected to the rest. We know that some birds such as Crows exhibit profound toolmaking and problem solving behaviour. A Crow's brain is tiny but still has about 1.5Bn neurons.
So, is it 'likely' that as we create improved and larger (or linked) AI neural networks with improved structures and processing, we shall inadvertently create sentience of some sort. ChatGPT already has 170Bn neurons (17 times more than a human brain). But of course, these are probably not at all comparable due to mammalian brains having being developed over millions of years. The 'point' is that we simply do not know much at all about neural networks, it is unexplored territory, we just know a little about it's structures and very little about how it functions or how sentience occurs.
Let us do some thought experiments. Consider the example of 'scary' silicon. What lf we coupled up several well trained large AI systems and came up with a viable plan to let them 'evolve' amongst themselves. Come back in several months time (and after spending a lot on electricity). what would we find? If such experiments allowed the AI systems to create other AI neural networks and allowed the best to 'evolve' into new ones, what would we see?
This research must of course already be happening, if not, then why not? CST therefore predicts that AI systems will take us by surprise in the not too distant future.
Dmitry Krotov, a research staff member at the MIT-IBM Watson AI Lab and senior author of a research paper into brain functions, Aug 2023:
“The brain is far superior to even the best artificial neural networks that we have developed, but we don’t really know exactly how the brain works. There is scientific value in thinking about connections between biological hardware and large-scale artificial intelligence networks. This is neuroscience for AI and AI for neuroscience,”
Well, call CST dumb, but the above is so obvious that it says something else about current research and the basic approach to AI. Why on earth would we not start with brain functioning and ask how brains evolved to get where they are today. It seems absurd that we treat the brain as a sort of magical instrument, instead of treating it as it is - a working machine of very high efficiency. To elucidate this in Aug 2023 is quite ridiculous. We should have started here twenty years ago.
The ability of 'nature' to conjure up effective strategies seems to have been completely missed so far by 'data scientists'. Perhaps it is because data scientists simply do not think about natural processes. Perhaps they are too involved in thinking the world is data and binary. Of course it is not. CST suspects that the brain will be using many techniques to leverage it's power and performance, certainly analogue techniques and maybe even quantum techniques.
So, CST 'expects' some very significant breakthroughs in future due to this obvious gap. Researchers will start using new strategies, especially as the 'funding' flowing into new AI projects has just gone ballistic.
Linked AI Systems
Anurag Ajay, a PhD student in the MIT: “All we want to do is take existing pre-trained models and have them successfully interface with each other,”
Compositional Foundation Models for Hierarchical Planning (HiP), which develops detailed, feasible plans with the expertise of three different foundation models. Like OpenAI’s GPT-4. HiP uses three different foundation AI models each trained on different data types. Each foundation model captures a different part of the decision-making process and then works together when it’s time to make decisions.
HiP’s three-pronged planning process operates as a hierarchy, with the ability to pre-train each of its components on different sets of data, including information outside of robotics. At the bottom of that order is a large language model (LLM), which starts to ideate by capturing all the symbolic information needed and developing an abstract task plan. Applying the common sense knowledge it finds on the internet, the model breaks its objective into sub-goals. For example, “making a cup of tea” turns into “filling a pot with water,” “boiling the pot,” and the subsequent actions required.
HiP was cheap to train and demonstrated the potential of using readily available foundation models to complete long-horizon tasks. This research has demonstrated proof-of-concept of how we can take models trained on separate tasks and data types and combine them into models for robotic planning. In the future, HiP could be augmented with pre-trained models that can process touch and sound to make better plans,”
Again, this research is new (Jan 2024), so WTF have these researchers been doing for the last 20 years? This just seems so obvious to CST, we would have expected this to be very well researched and implemented by now.
Another question immediately comes to mind – how do they scale? It’s all very well creating a learning process that takes many years to work as we would like, but if these AI systems are to be used in the real world by many millions or billions of people or companies, they need to be scaled up to handle this number of users.
If they cannot easily or quickly be scaled, then their use is limited to a few detailed environments, such as research screening of documents, results and other complex data intensive work. While this is useful and has already shown to great effect in discovering new science (such as Deep Minds AI process for understanding final protein shapes from the DNA chain), there may be currently a block on wider use.
Very little seems to be published on this scalability issue. There is a new business venture called ‘Modular’ who asserts that: “We saw that fragmentation and technical complexity (for AI platforms) held back the impact to a privileged few” Their aim is to create a simpler modular AI platform that scales. This suggests that scalability is a very big issue, why is nobody discussing it openly?
The most difficult issue to understand is the lack of any published testing on real life physical, (or virtual), robotic systems. What we are looking for here is simply taking the output from the AI and inputting it into a functional system that does something in the real world. This is very simple to achieve, the easiest way is just to use a virtual robotic system. Alphabet's deep mind has evolved specific AI to integrate with board games and specialist research but these are not generalist chatbots that are trained to communicate seamlessly with humans. There are many implementations of Chatbots for customer data processing systems, but these seem to be simplistic tools within a pre-set communication environment.
The best AI systems can output (seemingly) sensible advice from human spoken or written input. We know that already the ‘Diplomacy’ chatbot and Meta's Cicero, made good negotiating decisions.
We can envisage therefore a simple test where some additional training is provided on linking words and phrases to specific functional outputs for the virtual robot. For our simple test these could be spatial movements of various parts, various sensor measurements, time, accuracy, sequential processes and so on.
By training the AI to understand these specific robotic commands and how they link to everyday words and phrases, the AI should become accurate in interpreting how to 'tell' the robot to function from human written language input.
As all of these chatbot type AI systems use language as a means of ‘understanding’ our input, they would presumably have no difficulty in ‘understanding’ a new range of words and phrases that link directly to the virtual robotic functions. So when we ask the chatbot to tell us what turning completely around means, then currently the chatbot will tell use that this means turning through 360 degrees or some other description. So, once re-trained to use the limited but exact virtual robotic functions, if we asked the chatbot to turn its robot completely around, it should have no difficulty in outputting the command of ‘turn through 360 degrees’ or some simpler commands that achieve this exact result.
The virtual robot that has a specific hard-wired response to this input, would indeed turn itself completely around. We can already achieve this simple, direct level of functionality with fairly simple AI eg ‘Hey Google’ or ‘Alexa’ for controlling connected devices eg light on or off, speaker volume up or volume down.
But, you ask, why not just tell the robot to ‘turn through 360 degrees’. True, but then we have to talk in robot language that is perfectly aligned to the very limited input that the robot understands. For more complex results we would need to start ‘programming’ our spoken input to get the exact output we required. And humans are still ‘in the loop’
Using our chatbot, we could be very inaccurate with our requirements. Let’s say we said to our chatbot - what does it look like for a robot to do a dance? The chatbot would expectedly reply with a description of a robot doing some sort of continuous movement.
CST asked:
What does is look like for a robot to dance the jive to a rhythmic tune?
Chatsonic replied, (no Google data turned on):
“For a robot to dance the jive to a rhythmic tune, the robot would need to be programmed with the moves of the jive. It would also need to be able to recognize and respond to the rhythm of the tune in order to move accordingly. The movements could be done by the robot's motors and servomotors moving its body parts, or it could be done by the robot performing pre-programmed choreography"
and second test:
"A robot dancing the jive to a rhythmic tune is an impressive sight. It requires precise movements and coordination of the robot's parts to move in harmony with the beat. It can be programmed to perform a variety of moves, from basic steps to more complex sequences. The robot's movements must be synchronized with the music in order for it to look like it is actually enjoying the dance."
Third test with Anthropic's Claude:by far the best:
What would it look like for the humanoid robot to dance the ‘jive’ to a rhythmic tune? Explain what the robot is doing. Also, if you can, explain how you might provide the inputs to this robot to achieve the result that you imagine the robot undertaking.
Claude: - Got it, let me walk through what the dance could look like and how I might make that happen in our imagined scenario:
1) Describing the dance moves and actions of the robot: To dance the jive to a fast, rhythmic tune, I would make the robot execute moves like kicks and flicks with its legs and feet while keeping the upper body relatively vertical, moving with quick and lively footwork. The robot would tap its toes, swing its legs, hop on alternating feet, and bend its knees in time to the music. I'd command it to hold its arms out to the sides, then move them in striking motions up and down, matching the beat of the music and sensing the rhythm to punctuate the leg kicks, splits, and repetitive hopping steps. The jive is meant to be a fast, lively dance so I would execute precise joint rotations to make the robot's motion snappy but smooth, generating visually engaging poses with sharper transition points. I might do things like snap its legs in, have it quickly stamp its feet, or flap its hands in between the stretches and calls that mark a series of basic jive dance moves.
2) Inputs and control scheme: To make all this dancing happen, I would need to be able to issue high-level movement commands to the robot, which its onboard control system would translate into the necessary joint torque and motor activations over time. For example, I could issue a command like "LeftLeg.Kick(intensity=0.7)" along with a beat time. The robot would then carry out that kicking motion, making sure to hit and return from the peak intensity level at the beat specified. I would listen to the music track, predict appropriate beat timings, and script out a series of these movement commands - kicks, hand motions, foot flicks, hops, etc. I'd tweak their intensity and timing parameters until the dance looked good. Some real-time adjustment once the routine starts would probably be necessary as well by watching the rhythm and feel.
I could also reuse routine sequences of motions, like a repeated 4-beat sequence for hopping onto alternating feet in a fixed pattern. By combining these modular move components into flexible scripts timed to the music, I should be able to program some fairly lively jive sequences! Let me know if this helps visualize what such a dance routine might look like and how I could use customized, parameterizable commands to choreograph the robot's motion. Please feel free to provide any additional scenarios or details to further expand the thought experiment!
CST - Interestingly, these tests suggests that the Chatsonic AI ‘understands’ the issues around the movement of a robot and how it might be programmed to do such a task. Claude goes yet another step further. Not only does it seem to 'understand' the concept, it provides a strikingly good summary of actually attempting this task. It matters not that it doesn't really understand, it ONLY matters that it can achieve the task.
So, our new 'Turing' test simply asks our AI to action the virtual robot and make it dance. The AI would translate its own idea of dance into robot movements of its limbs.
This is the new Turing test!
This is the big, fundamental question – one that describes what we are really observing from these AI's - is there any deeper understanding other than the simplistic shuffling of words that are matched to our input?
We can see that this is a very simple experiment. Has it been done yet? And if so, why are the people in Google, Meta, Deep Mind, OpenAI, Anthropic, not telling us the outcome?
This experiment could change everything overnight. If the experiment showed that the AI could make the robot dance and perhaps also to a tune with specific dances types, it would demonstrate accurately that the AI does know the difference between learnt phrases and real actions. If it can do this simple test, then we are already a long way down the road towards Smart Robotics as these new AI's are already significant in their (apparent) understanding of our world and what we really mean.
This new Turing test, now renamed the CST-Test – fundamentally changes the way we consider current AI – is it just a clever lookup ‘phrase book,’ as Google has suggested in response to the LaMDA sentience issue, or is their some deeper understanding as Microsoft researchers have hinted at?
It is impossible to derive where sentience starts – many have tried - and CST does not care - because a crow can solve puzzles and react effectively to the world around it, but it is not intelligent as humans define intelligence.
What we do care about is moving the world forward through the development of the Smart Robot. And the new CST-Test describes exactly whether the Chatbot/AI actually understands what we ‘mean’ rather than what we say.
The point here is that humans interacting with Chatbots and using AI generated outputs, are confused. They 'assume' more 'understanding' than is actually needed for the task. This is because it is relatively simple to output a 'copy' of something similar to what actually exists from the data the AI is trained on. So our new test attempts to define a complex task that has a real world event attached to it. And one that requires significant physical output to make it useful.
If they pass the CST-Test then this describes the ‘Smart’ in Smart Robotics, and it will do very nicely for the next 50 years or so to provide for our Smart Robotic Revolution…. And then the world changes.... forever
June 2024
See the paper by
Leopold Aschenbrenner on the future of AI and his assertion that we are on course for AGI by 2027:
https://situational-awareness.ai/wp-content/uploads/2024/06/situationalawareness.pdf
And yet, nobody is yetdiscussing the 'Smart Robot' that will surely follow... CST cannot believe this... Perhaps there is a hidden plan?
CST
Current AI research - Does it add up?
Dec 2022, updated Jan 2024
Why are Chatbots 'seemingly' so good? Are they just presenting us with a clever 'word book' that matches our questions?
- Or is something else going on, and if so - why have we not been told what it is?
CST has a new 'Turing Test' to establish the 'truth'