Notes for a talk given by Peter Small at the Boston BOT2001 seminar, 19th June 2001
A self-organizing, living database for volatile data
by Peter Small
The two previous talks in New York and San Francisco dealt with the theoretical and design aspects of living databases (or organic databases as they are often called).
This talk will be concerned more with the application of living databases and how they can be used with the concept of stigmergy to create self-organizing Web sites.
The demonstration will be of a prototype system for building and maintaining a database for cancer treatment trials. This is a subject where the information is highly volatile and comes from numerous sources world wide.
Using conventional database technology, it would be prohibitively expensive to build and maintain such an information source. Using an organic system that is self-organizing as a result of user inputs, the setting up and maintenance costs are minimal.
The problem to be solved
In many areas of information technology, the information to be processed is rapidly expanding, evolving and changing.
This is usually where the subject matter is strongly influenced by:
new research findings
These ingredients result in highly volatile information that is difficult and expensive to locate and maintain.
Realizing the difficulties
The problem can be illustrated by considering how cancer patients might be informed about current cancer research, availability of treatments and treatment trials. To create a Web site to provide this information would seem to be a perfectly straight forward proposition that is, until you start to consider the difficulties involved.
There are over 400 types of cancer. Each can have several kinds of variation. Patients can be at various stages of the disease's progression. They may or may not have had previous treatments. The cancer might be or might not be operable. Every different combination of these variations will require a different kind of treatment and involve different kinds of research. Even a very simple structure, to separate out these basic considerations, will result in a hierarchy that has several thousand different end points.
At each of these end points, specific and detailed information has to be delivered to cover a plethora of human activity involved in the research and treatment trials in the niche area concerned.
Several thousand nodes to be monitored? Several hundred thousand sources of informational content to be located, checked and authorized? This problem becomes impossibly difficult with information that is constantly changing and evolving.
Content management systems
Compilation, organization, maintenance and distribution of information is normally handled through some kind of Content Management System (CMS).
Content management systems are big business. Most comprehensive systems cost $2m - $5m (and even more if they need to be integrated with other systems). It is conservatively estimated by Faulkner Information Services this market will increase to $65 billion by 2003.
Why are CMS systems so expensive?
Content management systems are expensive because it is deemed essential to be able to organize and have complete control over all aspects of an information system. Some of the considerations might be:
The coordination of a large variety of different information coming from different sources
Synchronization of related content
Keeping check on the responsibility for content accuracy and legality
Continuous monitoring and updating of content
Constant rearrangement of the navigation system
Need to make immediate changes when errors occur or information is superseded
Need to cater for a large variety of different users who have different kinds of access
Content may need to be in several languages
Each individual user may need to have a different and unique version of the content
Areas of the content may have to be safeguarded through limited or privilege access
The list can go on and on, so it is easy for content management system (CMS) vendors to make out a strong case to justify the cost of an expensive content management system.
Does a content management system solve all the problems?
As long as you spend enough money, you can solve all the problems - except for the most important one:
the costs of running and maintaining the system.
As a hierarchical information system expands, the total activity at the nodes and ends of the branches increases exponentially. Whatever controls are in place, all this activity must involve people: content producers, authorizers and checkers.
If the system is dealing with highly volatile content, the cost of collecting, organizing, monitoring and updating the information can be prohibitively exorbitant, however well the system is managed.
This is the reality that killed off so many dotcoms over the past two years.
Is there a better way?
Content management systems are universally associated with databases. But why? This is not the approach used by swarming insect colonies, who have similar problems involving information gathering and dissemination.
Ant colonies, for example, have no central database. They rely on a system of organization called stigmergy to guide the worker ants to currently available sources of food.
Technical note: Stigmergy, is the name given to a system of organization that occurs as a result of an interaction between individuals and their environment. In essence, individuals make changes to the environment and the changes they make have an effect on their behavior ( a positive feedback effect).
Ants mark their environment with pathways by laying trails of pheromones as they search for and collect food. Evaporation of these pheromone trails cause the pathways to fade and disappear if they are not regularly used. This provides an automatic updating process, which ensures only the most recent and most successful pathways are maintained.
There is no need for a database. There is no need for controls or even communication between individuals. The system is organic, self maintaining and self regulating. It is this concept, of stigmergy, that lies behind the principle of a living database.
The efficiency of a stigmergic system
To appreciate the efficiency of a stigmergic system, it might be useful to consider how an ant colony might be organized if it used a similar, central database type solution to that used by most content management systems.
All information relating to the environment and sources of food would be stored at a single location: perhaps with the queen ant. This information would have to be placed there by worker ants, after they discovered new sources of food. The queen ant would have to organize this information to be able to identify the location of the sources needing an internally constructed map of the environment.
There would be excessive traffic around the queen as worker ants reported in with their intelligence and others reported in to find out where to go for food. There would be time delays: between an ant finding a food source, and that information reaching the queen.
When a source of food dries up or disappears, the queen ant will not know about this immediately. She wouldn't know unless an ant came back to the nest to inform her. She'd still be directing ants to empty sources unless the ants were genetically programmed to report back to her after failed missions.
Adding an updating procedure to an ant colony's repertoire of behaviors would significantly increase the activity of the nest and the complexity of the whole system.
Compare this imaginary scenario to what actually happens: where ants pick up their information from pheromone trails without having to keep reporting back to the nest. Isn't this activity so much less complicated and more efficient?
A distributed database is better than a point source
It is easy to see how a centrally organized database would present many problems for an ant colony. So, ants have opted for a distributed database - in the form of pheromone trails spread out across their environment
Clearly, having all information relating to food sources built into pathways created within the environment is a far more efficient way to run things than concentrating the information at a single point source.
Users create their own pathways to the information they need
Similar to the way ants lay trails to food sources, users of a stigmergic information system can create pathways to head towards niche areas that are of special interest to them. In the Cancer Treatment World example, the trails lead to a country, then to a town in that country, then to a building in that town and finally to a meeting room in that building.
All pathways end at meeting rooms, where people following a particular pathway will meet others who by virtue of their choice of pathways will have a similar interest.
If a needed information source doesn't exist, any user can bring one into existence open a special interest meeting room simply by creating pathways to one. People with similar informational needs, would follow the paths opened up by such initiatives resulting in a narrowly focused group meeting at the place where the path ends. There is no need to name or specify in advance the subject matter of a meeting room: it will be determined by the choices provided by the path initiators.
Pathways opened up by users are allowed only a limited life. The life is extended, but, only if they are used regularly and give rise to useful activity in a meeting room. This emulates the evaporation process of ant pheromone trails ensuring only up to date pathways are maintained.
Example of a pathway being built
To illustrate how users create their own pathways to meeting rooms, the demo shows the path building activity of a single person: an oncologist, working for an International drug company, who is conducting treatment trials for colorectal cancers.
This activity can be likened to that of a single ant, creating the first trail in a virgin landscape.
The oncologist wants to lay a path for people with both colon cancer and rectal cancer. He begins by specifying paths to towns that he creates in these countries (he does this simply by determining the pathway descriptions).
For this particular oncologist, the towns he would like to see in the countries dedicated to colon and rectal cancer would be such that they separated out the people into four stages of the severity of their condition. He names these stages 1, 2, 3, and 4 (also a, b, c, d).
To help the patients who are following the pathways he is creating, he includes a help note at this decision point, to help patients decide which of the four paths is best for them to take (which of the towns to go to).
The trials he is conducting are for drugs designed to help patients who are at the most advanced stage of cancer, so, he only continues to create more paths to buildings in the towns that are concerned with this most severe stage (stage 4 or stage d)
For his trials, this oncologist would also like to separate the patients who have an operable condition from those who haven't. He does this by creating paths to two buildings in each of the towns he has created (which come into existence when he specifies the pathways). One building is for people with operable cancers, the other is for those whose cancers are not operable.
The oncologist would also like to see the patients separated into different groups according to whether or not they have received prior chemotherapy treatment. He does this by establishing two different rooms in each of the buildings he has created.
Simply by creating this series of pathways from country to town to building to room , the oncologist has separated out from the huge variety of different possible conditions only those patients whose conditions match the basic requirements for his cancer treatment trials.
From the patients point of view this works very well because it doesn't waste their time investigating all kinds of different conditions that do not really concern them.
Those that do have the condition specified by the oncologist's paths, are lead directly to meeting rooms where not only will they be able to meet the oncologist who opened up the pathway, but, also everyone else who follows this particular trail. This will concentrate together all the people who have a special interest in the particular narrow areas of specialty that this oncologist's paths lead to.
The path building activity of this single oncologists will be repeated by hundreds of others as they forge paths to different meeting rooms to satisfy their different needs. Different kinds of oncologists, physicians, doctors, specialists, patients and their friends and relatives will form their own paths to the particular areas that interest them. This will open up a myriad of different meeting rooms, every one dedicated to a specific type of cancer and set of conditions.
In this way, Cancer Treatment World fills up with pathways in the same ways as the landscape around an ant's nest fills up with pheromone trails.
It's about people, not content
Most content management systems are designed around a hierarchical structure determined by the nature of the content. This works fine when dealing with stable and reliable information, but, if content varies unpredictably, such a system is inherently unreliable.
People produce the content.
People view the content.
People inspect the content.
People judge and evaluate the content.
People authorize the content.
If content is volatile and constantly changing, then it is necessary to base an information system around people rather than the content. This is what a stigmergic system does: it is totally independent of content. It is concerned only with the actions of people.
By emulating the reinforcement and evaporation effects of pheromones, branches to people who produce valuable content are maintained and strengthened. Branches that lead to people who produce unsuitable, inaccurate or out of date content either fade away or are terminated.
Being independent of content, a stigmergic system is fundamentally generic. It is highly adaptable and flexible: perfectly suited to information that is prone to unpredictable change.
Example of a stigmergic information system
The Cancer Treatment World example uses a stigmergic system to solve the problem of providing cancer patents with information on cancer research, treatments and treatment trials:
The system is organic, self maintaining and self regulating.
There is no database.
There is no pre-planned organization or hierarchy.
Overheads and maintenance costs are low
Information in the meeting rooms
Cancer Treatment World does not provide any content. It provides only the facilities for users to create routes to meeting places where they can share information with one another.
This sharing of information is normally through personal agents (see next sections), but, facilities are offered for users to post up details of Web sites, newsgroups, discussion forums, treatment trials, etceteras, that are specifically related to the condition covered by the meeting room.
By following the pathways created by the example oncologist through anal or rectal cancer, stage D, not operable and after having had chemotherapy the meeting rooms will show an example of a Web site added by the oncologist and, under the listed trials for this condition, the oncologist has provided a link to describe the trial he is conducting (together with the entry conditions).
Using personal agents to communicate
Hierarchical paths in Cancer Treatment World do not lead directly to information they lead to people in virtual meeting rooms. In these meeting rooms people pool their knowledge and share information.
This is where the personal digital agents come in. They can act on behalf of people: to increase the efficiency of communication and at the same time provide a high level of individual privacy.
The need for privacy in such a delicate situation as cancer treatment is obvious. This makes it imperative that communication is indirect.
How personal agents work
In every meeting room people can pose questions. These questions are not directed to any particular person but are added to a growing list of questions stored in the meeting room (every meeting room has it own separate list of questions). This list of questions is then referred to whenever visitors to the meeting room want to create a personal agent for themselves.
Agents are created only in meeting rooms. During the creation process, the owner looks at the questions that have been stored in the meeting room they are interested in and answers as many questions as they can. These answers are then built into their agent's knowledge base.
After an agent has been built, it is left in the meeting room where it was created for other agents to be able to ask it questions. Conversely, a person with an agent in a meeting room, can get their agent to ask other agents for the answers to particular questions their owners have provided.
In this way, everyone can have access to the combined experience, knowledge and opinions of everyone at the meeting place, without any personal or direct contact. New questions and answers can be added at any time.
Not a database in sight
It may well seem that this routing of a myriad of pathways through a vast hierarchical system would need to be set up within a database. It would also seem to require a database to facilitate the meeting places and the creation and interaction of the personal agents.
Not so. The whole system is run on a server without a database or any kind of server side control. Mysterious? No, this is the way ants do it.
The server acts in the same way as a queen ant. It doesn't provide information but provides genes to enable individuals to be able to create and read message pathways. It also provides genes that enable individuals to create agents and communicate through them with other agents.
The genes in this case are small code modules that are built into Web pages. They empower the client side to write routing instructions very similar to the way genes in worker ants allow them to write pheromone messages. Similarly, these gene like, code modules allow the client side to interpret messages on stateless documents so as to know where it is in a hierarchy in much the same way as an ant can know where it is in a landscape of pheromone trails.
It may seem strange to be doing this without a database, but, an ant would think it even stranger to do it with a database.
A big issue, with content management systems that deliver content through a database, is security. Highly sophisticated protection is needed to prevent hackers from altering the system or the content. This is a non issue as far as stigmergic systems are concerned because there is no database for hackers to break into. There are only files.
Files on a Web server can of course be altered, but, they can easily be refreshed at frequent intervals (from an isolated source). This limits any permanence of hacker activity to only the time between refreshment cycles.
Easily adaptable to other scenarios
The motivation, driving the example of Cancer Treatment World, is the need for cancer patients, their friends and relatives to find out about suitable treatments and trials applicable to a patient's unique condition. More pertinent for income revenue reasons the same system helps clinicians and drug companies find suitable patients for their trials (where the candidates have to have specific profiles). This situation can be likened to helping two needles in a hay stack find each other.
However, this is just one of any number of different situations where a stigmergic system can be applied. To suggest a few examples:
Any health information system
Any kind of technical information system
Company information systems
Any kind of market
Employment vacancies and availability
Real estate sales
Educational sources and opportunities
Maintenance and service information systems
Note: since this talk was given, it was decided that creating a database that looked like a map of the world would be unnecessarily complicated. Thus, the meeting rooms described in this talk eventually took on an appearance similar to the Kempelen Boxes shown in the 'Example' found in the menubar at the top of this page.
Many of these Kempelen Boxes, each dealing with a different kind of cancer, are linked together by means of a dynamic, hierarchical navigation framework, which 'grows' and sef-organizes as the living database expands.