|
by Peter Szabo and Dmitry Volkinshtein
Supervised
by Dr. Yaakov Engel
|
Abstract
The octopus arm is known for its unique
muscular hydrostat structure - it applies force with the sole use of muscles,
without any rigid skeletal support. The biomechanical attributes of such an
arm enable it to perform tasks no skeletal arm can perform. Hence, a robotic
implementation of an octopus arm with a real-time learning control mechanism will
yield a highly versatile application, with yet unseen robotic capabilities. Due to the
arm's continuous nature and high level of complexity, no existing machine-learning
algorithm was shown to give applicable results in terms of time and
space consumption.
The problem
The goal
of the project is to teach the octopus arm's 2-dimensional model to
reach a given point in space with reasonable time and space consumption and with high
success rates.
The solution
To solve
this intricate problem
we used a novel
Reinforcement Learning approach, the On-Line GPTD, upon the 2-dimensional multi-segmented dynamic model of the
octopus arm.
The Octopus arm model
The model we used is
a 2-dimensional segmentation of the arm, utilizing only masses and springs
for its dynamic characteristics. The arm is divided into (N-1) rectangular
segments, each defined by four vertices. For simplicity, the muscles are
deprived of their mass, and the entire arm's mass content is concentrated
in point masses. The point masses are located in the four vertices of
each segment, giving a total of 2N masses. The idealized massless springs
function as muscles and connect all the adjacent point mass pairs of the
model. The 2N masses are arranged in N pairs, each consisting of one ventral
and one dorsal mass. (N-1) ventral and (N-1) dorsal longitudinal muscles
connect the N ventral and N dorsal masses respectively. In addition, a
transverse muscle connects each ventral-dorsal pair. Figure 1 shows the
general structure of the modeled arm. The simulation supports various
physical parameters such as gravitation, water specific weight, arm specific
weight. Within the simulation parameters, the user defines a set
of activations (i.e sets of muscle strength forces for each
segment). These activations enable the complex movement of the
octopus arm. This simulation was implemented in an earlier project
by Chen Kojokaro and Keren Sasson, also supervised by Yaakov Engel.
For more details see the simulation project's site
Figure 1
On-Line GPTD (Gaussian Processes for TD
learning) The algorithm is used for finding the
value function of an MDP online. Basically, the algorithm is about
imposing a Gaussian Process prior over the value function, suggesting
prior knowledge: E(V(x))=0, E(V(x)V(x'))=k(x,x'). Here, k(x,x') is a
kernel function which reflects our prior knowledge concerning the
similarity of values between two states. Using the Bellman equation,
and the prior knowledge above, the MDP can be written as a matrix product.
Using these matrices, an efficient sparsification method and the
standard Gaussian variables theorems, the on-line
GPTD algorithm is obtained.
The Learning
System We have implemeted both stochastic and
deterministic on-line GPTD algorothms in C++, creating a general purpose
and standalone algorithmic module. Using this module as an
on-line value function estimator, we have
implemented different policy iteration methods: OPI (Optimistic
Policy Iteration, which includes e-greedy and
softmax), Interval Estimation and Actor-Critic. In addition, the
system enables various reward and goal settings. A general scheme of
the learning algorithm is shown in Figure 2.
 Figure 2:
The learning algorithm.
Performance
Evaluation
For evaluating the arm's performance, we saved the GPTD matrices
(the value function) during the learning process. For each GPTD save, a greedy
simulation was run over a pool of randomly distributed the initial arm states. The resulting
data for each initial state was whether, under greedy policy, it had
reached the goal, and if so, how long a time did the trial last. This
data was then rendered into learning curves, mean run time curves and rates of success.
We expect the success rates to increase over time,
and the mean run time to decrease
respectively.
Results In our experiments,
we used an octopus arm simulation with 10 segments, which yields
resonable complexity, yet is still flexible enough to manipulate. The
state vector has 88 dimensions, since 10 segments result in 22
masses, where each mass has position x, y and velocity dx,
dy. In general, a
basic set of
5-10 muscule activations was used, the trial time limit was set to 4
simulation seconds and each simulation step lasted 0.4
sec.
We present several examples of results. The rest can
be found in the project's
book:
1) This is one of our basic learning tasks, in
which the arm's base was fixed (the two base masses cannot
move), and the goal can be
reached with any
of the arm's vertices. In addition, the gravitational acceleration was set to
9.8 m/s^2, as on earth's
surface.
Simulation demos
2) This is one of our advanced learning tasks.
This time, the arm's base can rotate in both directions, and the
goal can be reached only with the
two masses in
the extremety of the arm. Here to, the gravitational acceleration was set to
9.8 m/s^2, as on earth's
surface.
Simulation demos
 |
 |
Conclusions We
believe our results to be self-evident. In each and every task that was
presented to the learning system, the agent eventually succeeded in
reaching the goal with striking rates of success, proving the capabilities
of the GPTD de facto. All this was accomplished with a considerably small
dictionary in comparison with the total number of different states visited
by the agent. Finally, we believe that the GPTD based octopus arm can
handle even more complex and realistic learning tasks, namely those
typical of octopi. These would be reaching any point in space (not just
one given point as shown above) and even chasing a moving goal.
Another further development would be enabling physical interaction between
the goal and the arm, which will allow the arm to manipulate the goal’s
location in space. The arm may pull the original goal towards a distinct
location in space (a second goal), as a real octopus would drag food into
its
mouth.
Acknowledgment
We wish to extend our sincere gratitude
to our instructor, Dr. Yaakov (Yaki) Engel, for his advise, support and
guidance. We would also like to thank PSPL staff and chief engineer Johanan
Erez, for providing us an impeccable working environment and allowing
us to occupy computers for long simulations. We also wish to thank
the Ollendorff Minerva Center, which supported this project.
Related
Information
Project Book
Presentations [Midterm][Final]
Source Code [Octopus Arm (Learning System &
Statistics)][Initial States Creation]
Simulation Movies
|