Version 1.4
Authors
C. Larsson, N. Wedi, H. Thiemann
The purpose of the PRISM system is to
enable users to perform numerical experiments, coupling
interchangeable model components, eg. atmosphere, ocean, biosphere,
chemistry etc., using standardised interfaces as outlined in
section (I.3). The general architecture provides the infrastructure
to configure, submit, monitor and subsequently postprocess, archive
and diagnose the results of these coupled model experiments.
There is an emphasis on choosing an architectural design that
allows these activities to be done remotely, eg. without the user
physically being in the place where the numerical computations take
place.
The required features of the PRISM system are analysed with
respect to the processes involved, the actions they take and where
they happen.
The general architecture influences the security models that can
be applied.It defines the possible operations of a user from a
remote site. An architectural design is presented that satisfies
the security and configurability demands required by all processes.
Existing technologies are investigated to assess the constraining
implications of their use.
It is expected, that the cost and capacity of future computing and
network technologies are changing. Therefore, it is important to
design an adaptable and scalable architecture.
An experiment is an ensemble of tasks running
on a supercomputer, defined by a configuration process. Three
levels of communication exist within such a coupled experiment:
A task is an individual job step of an
experiment that needs to be executed. Fig 1 is a hierarchical view
of an experiment with the tool Xcdp showing a
collection of tasks. The different boxes represent different tasks
and the colour code shows the status of the task. Note, that the
coupled model, i.e. coupler and component models, represents a single
task of an experiment.
| Xcdp graphical view of an experiment |
|
| Figure 1 This figure shows the status of tasks as different colours in an overview of an experiment. |
One major objective of the PRISM project is to develop a standard interface for each possible pair of models constituting the global climate system. The models will exchange information through standard interfaces with a universal coupler or directly with the other model components. The coupler is the program responsible for controlling the coupled model formed by the different component models, and controlling the exchanges and transformations of physical data between them. This is detailed further in section II.2.
Three basic phases of configuration can be
identified:
The system should allow the domain activities to happen by remote access, i.e. the users do not have to be physically in the same place as where the model is executed or data is located. The interaction between the users and the system takes place through a user interface. This interface establishes the identity of the user and allows for access to the systems functionality. The functionality is provided by an number of specialised servers accessed by the client user interface (UI) detailed in section II.4.
An administrator will be provided by each
institution maintaining and developing a particular model component
of the coupled PRISM system. The administrator has several
tasks:
The experiment results in the output of data fields, statistics and diagnostics. This output needs to be archived, catalogued and made accessible to the modeller who will need to visualise the diagnostics and data to understand the results. It is the task of the archiving and data management system to accomplish this and the details are explained in the section II.3 documents.
The partitioning of functionality allowing a client to perform operations outside its own capability with the help of a more powerful server is called client/server computing. The client and the server may reside on physically different computers and they communicate by accessing computer networks.
Three actors on the PRISM system can be
identified: Users, developers and administrators.
| Table of PRISM Actors and Activities | ||
|
||
| Table 1 This figure shows the PRISM actors and their actions on the system. | ||
| Table of PRISM actors and main activities | ||
| Actor | Main activity and interaction with system | Acts on |
| PRISM administrator | Executes administration tasks | Definitions of administration entities |
| PRISM administrator | Provides definitions of all entities | Model component interfaces and metadata |
| PRISM user | Composes coupled experiments | Definitions of all entities |
| PRISM user | Visualises, queries and manages | Model results |
| PRISM developer | Developes model components | Model components |
| Table 2 | ||
The actors realise their activities by means of a user interface and the following processes can be identified from the activities above:>
| Table of client configuration processes (1) | |
| Where activity takes place | What activity |
| User interface | Configuration |
| Visualisation | |
| Authentication | |
| Archive query | |
| Documentation | |
| Configuration instance | |
| Monitoring | |
| Table 3 | |
| Table of configuration provider processes (2) | |
| Where activity takes place | What activity |
| Configuration server | Configuration |
| Configuration server | Configuration instances |
| Documentation server | Documentation |
| Authentication server | Authentication |
| Administration server | Model build configuration |
| Administration server | Model build configuration instance |
| Experiment database server | Configuration instances for the experiments |
| Visualisation server | Visualisation |
| Monitoring server | Start/stop/state information |
| Table 4 | |
| Table of execution processes (3) | |
| Where activity takes place | What activity |
| Scheduling server | Configuration instances |
| Execution server | Coupled model (coupler + component models) |
| Execution server | Data pre/post processing |
| Archiving server | Archiving |
| Table 5 | |
The client configuration process (UI) is
accessed through the Internet. The configuration provider processes
are accessed through a central site but the services can be
distributed to other sites without functional difference. The
execution process is local to the model provider. This is described
as directory centric, web enabled and distributed from local PRISM
sites.
Variations of the architecture are shown below in three figures A, B and C.
The figures show a Central PRISM site and a Local PRISM site. The
local site is where the execution, scheduling and archiving server
is located and the central site is one of the participating PRISM
sites where the configuration provider processes listed in Table 4
are located and used by all client processes. The component boxes
in the figures represent the following:
| Central Site Architecture,directory centric. |
|
| Figure 2 |
| Common Data Architecture, model provider centric. |
|
| Figure 3 |
The data in the system consists of :
Inter operating components in collaborative
and distributed environments are deployed and maintained by
multiple administrators and thus upgrades and maintenance is likely
to be uncoordinated. With no central authority to plan and execute
upgrades some clients will always be out of synchronisation. This
applies to to the domain model,i.e. the scientific model as well as
the computing model,i.e. the infrastructure components such as application servers.
For the domain model a practice of allowing the different software
(component model) providers all to act as administrators enabling
them to start distributions of upgrades to all sites should address
this problem.
For the computing model infrastructure software such as service
providers the problem is further complicated by the fact that
services can call each other. A service should:
| PRISM software deployment model |
|
| Figure 5 This figure shows the concept of a central service provider deploying new versions of components on local sites. |
System security is made up of the following
components:
There should be an access model enforced in
the system to ensure proper authorisation. The level of control is
to be set to be practical in terms of administration and
confidentiality and to be determined by the PRISM partners.
The levels of integrity in system transmissions is to be high but
the confidentiality is not so important as messages will mainly
consist of configuration information which is less useful if you
have no access to the software being configured.
The service to service authentication needs to be solved in a
scalable manner consistent with the administration resources
available.
The PALM project aims to provide a general
structure for a modular implementation of a data assimilation
system. In this system, a data assimilation algorithm is split up
into elementary "units" such as the observation operator, the
computation of the correlation matrix of observational errors, the
forecast model, etc. PALM ensures the synchronization of the units
and drives the communication of the fields exchanged by the units
and performs elementary algebra if required. This goal has to be
achieved without a significant loss of performances if compared to
a standard implementation. It is therefore necessary to design the
PALM software in view of the following objectives and
constraints:
UNICORE is meta computing framework based on a Abstract Job Definition that can be submitted to different sites from java clients. Gateways receive the jobs and translate and schedule the definition for execution on the available hosts. Security is based on certificates. The client is downloaded once together with definitions.Requires programming skills to develop new job definition interfaces (plugins) and takes considerable effort. Mostly lacks comprehensive scheduling and monitoring mechanisms and is not used in a production environment yet.
Quoting from www.globus.org:
The Globus Project is a multi-institutional research and
development effort creating fundamental technologies for
computational grids. Grids are persistent environments that enable
software applications to integrate instruments, displays,
computational and information resources that are managed by diverse
organisations in widespread locations. A primary product of the
Globus Project is the open source Globus Toolkit, which is being
used in numerous large Grid deployment and application projects in
the United States, Europe, and around the world.
Parts of the Globus project software relates to the tasks at hand
in PRISM, such as security mechanisms (certificates),resource
lookup,scheduling and Message Passing (MP) technology. Benefits are
that many important institutions and commercial interests are
supporting the Globus initiative.
PrepIFS is an interactive meteorological
application to prepare research experiments using the integrated
forecasting system (IFS) at ECMWF. Both researchers at ECMWF and
scientists in institutions anywhere in Europe (subject to prior
permission) can access the complex computer environment at ECMWF
via the Java application prepIFS or via the INTERNET using the
Java-Applet PrepIFS and any standard WWW-browser.
Forecast-/Analysis-Experiments can be prepared and submitted
remotely.
The system uses a combination of web servers and application
brokers/directories/providers to communicate with the preparation
client application which contains functionality to validate the
prepared experiment before it is submitted for processing.
Supervisor Monitor Scheduler (SMS) is an application that enables
users to run a large number of programs which may have dependencies
on one another, and in time, in a controlled environment with
reasonable tolerance of both hardware and software failures,
combined with good restart capabilities. SMS submits tasks and
receives acknowledgements from the tasks when they change status
and when they send events. SMS knows the relationships between
tasks, and is able to submit dependent tasks when a given task
changes its status, for example when it finishes.An associate
application Xcdp allows you to monitor and
change jobs in the scheduler in a GUI. The scheduling application
is currently only used within local area networks.
SSH , Secure Shell is an authenticating protocol used for remote host access and is very secure. It works with public and private key authentication and encrypts transferred data. It has commands for ftp and login and may be a useful tool for administration.
It has been suggested (ref 1) that the rate of technology change, i.e. the rate at which capacity doubles or price halves, are around 9 ,12 and 18 months for networks,storage and computing power. If network performance doubles relative to computing power every 18 months it will become essentially free. From this point of view it is important to select an architecture that can exploit this advantage.
The best designs in order to achieve remote
access, modularity and extendibility are the directory centric (A)
and the model provider centric (B) architectures as outlined in
section Proposed architecture
.
The directory centric (A) architecture benefits from that it
minimises the duplication of static or semi static resources (i.e.
land and sea mask). It also allows central content to grow but
local content can still be chosen if appropriate. For deployment
PRISM sites do not need full web and application servers and so
makes management easier. Future cooperative techniques can be used
from the central site, such as client visualisation displaying on
many clients.
The drawback of the directory centric (A) architecture is that the
complexity increases as resources needs to be advertised and
discovered by clients. Some concentration of processing power may
be required to serve all clients.
The final architecture will show that combinations of local and
central resources are possible as they will not compromise the
system.
The administration effort required for the central site
architecture is likely to be less as the duplication of data and
software is not necessary and thus fewer physical copies needs to
be accessed.
It is important to understand that most of the system service
communication is made over the Internet, a network over which we
have not full control. As a result response times will vary
considerably for messages and the actions invoked through the user
interface. Recoverability is limited as it is often difficult to
diagnose where errors occur. If a certain level of performance is
deemed essential a virtual private network should be set up with a
service level agreement.
The technology that realises the proposed architecture is known as "Web Services". This includes the use of web servers, application servers, resource directories and discovery mechanisms and message services and the use of Java clients and servers. For security mechanisms certificates and Secure Socket Layers (SSL) as well as encryption can be used. Web services as a technology is service centric, allowing clients dynamic service discovery over networks such as the Internet. It is usually deployed as a three tier system involving a front end presentation layer such as a browser or java client communicating with a remote domain application (service) through a web server.The web services infrastructure will see benefits coming from application integration of diverse software made possible by standardisation and directory technologies to enable service providers to publish their services irrespective of implementation technology.
The issue of standardisation of interfaces in
complex and configurable systems becomes very important in
deploying distributed architectures for scalability, extendibility
and future success. A key factor making the inter operability
between software possible is the development of XML, the eXtensible
Markup Language. This language allows for standardisation of
messages between systems enabling clients and servers to inter
operate over networks. The development of XML promises to
standardise several other important technologies such as :
It would be possible to implement all server
components in any suitable language such as Perl or C++. Currently
there is no client software that can be used with browsers that
does not build on Java technology. From a system maintenance point
of view using one technology, Java, is the preferred way as this
simplifies the task of adhering to multiple standards. The best
choice is therefore to implement all software in the infrastructure
in Java but not to restrict it if there is a case for using other
technologies. Java supports all the mechanisms needed for
implementing web services using available standards. Other
technologies are Microsofts DotNet and HTML. Today DotNet
technology is very new and is also proprietary in nature. The use
of a HTML client severely limits the intelligence that can be built
into the client and is therefore seen as less useful.
Other projects such as Globus have published similar ideas (Open
grid services architecture) building on the web services
concept.There is no doubt that the web services concept will be the
dominant paradigm over the next 5 years and together with
standardisation of technologies, increased network speed and
cooperative efforts, the systems that are ready for the
interaction, will have an advantage.
| Table of risks | |||
| Risk | Risk Magnitude | Description | Impact |
| Security demands incompatible on some sites. | Severe | Multiple security solutions may be necessary. | If sites cannot agree on one security solution it may introduce costly separate solution affecting the client experience. |
| Lack of infrastructure resources. | Severe | Hardware and software must be available for web services implementation. | Slow or nonexistent services. |
| Table 6 | |||
| Table of Definitions,Acronyms and Abbreviations | |||
| Keyword | Description | ||
| WSDL. | is an XML format for describing network services as a set of endpoints operating on messages containing either document-oriented or procedure-oriented information. | ||
| XML | The Extensible Markup Language is the universal format for structured documents and data on the Web. is a human-readable, machine-understandable, general syntax for describing hierarchical data, applicable to a wide range of applications Custom tags enable the definition, transmission, validation, and interpretation of data between applications and between organisations. | ||
| Simple Object Access Protocol | SOAP is a lightweight protocol for exchange of information in a decentralised, distributed environment. It is an XML based protocol. | ||
| Java | A programming language invented by Sun which runs on any platform and supports web services. | ||
| SSL | Secure Socket Layer, a Netscape invented secure communications protocol. | ||
| SSH | Secure Shell, a secure remote access protocol pioneered by BSD enabling remote logins. | ||
| Kerberos | Network security system developed by MIT. | ||
| SKey | One-Time Password system. | ||
| UDDI | Universal Description, Discovery and Integration.Enables dynamic lookup and advertising of services. | ||
| X509 | Standard for certificates used by SSL authentication and encryption. | ||
| Table 7 | |||