The traditional file system interface is augmented with a uniform, logical interface for secure, scaleable, distributed information sharing. Digital Libraries : Integrating Content and Systems. European Architecture First Among Equals. The Rains. The script may transfer log-on information including user ID and password from the digital library system to the external KOS, in order to provide access to the Web-enabled database.
In the case of a more direct link, the access may be by URL. However, the use of a URL as the link has the same problem with persistence as does direct access via a URL from a browser. It is important to determine how often the URLs in the KOS change, whether there is a means of notification of these changes, and whether it is possible to consider an alternative that would be more persistent.
Schemes such as the Digital Object Identifier and the Persistent URL have been devised to enable resources to be physically moved among servers without having their names changed. The benefit of linking to a remote resource is that the resource will always be up-to-date. The maintenance of the KOS is in the hands of the owner, not the digital librarian.
It may also be more apparent to users that the KOS is not owned by the digital library. Linking to a remote KOS also has disadvantages.
See a Problem?
Persistence and unexpected changes in the organization and content of the system may cause problems. The software or telecommunications route between the digital library server and the KOS may be unreliable. In systems requiring fast response time or large amounts of data transfer, and, therefore, high bandwidth such as full-motion video or detailed graphics , the fact that a connection must be made between the digital library and the external KOS may make the system unacceptable to the user.
Alternatively, the KOS may be obtained from the owner and loaded locally. In many cases, this requires licensing that may not be required when the KOS is accessed remotely, because a copy of the whole resource is being provided to the digital library. Loading a KOS locally also requires that one consider issues such as maintenance, local system administration, and disk storage. If the KOS uses special software, such as a database management system, loading the KOS locally will require a copy of that software, which may require additional purchase or licensing.
Other considerations are the need for firewalls and interface design. On the positive side, the KOS is under more local control. Therefore, it may be possible to improve the response time by not accessing the KOS over the Internet. If the KOS is to be used behind the scenes that is, the system is not visible to the user , concerns of speed and integration become more important.
If additional modifications including digitization need to be made to the KOS to integrate it with the digital library, it will also be necessary to load the KOS locally. If the digital library intends to incorporate numerous secondary KOSs, it is important to consider the degree to which the architecture is scaleable. While its main purpose has been to develop a metathesaurus for moving among these vocabularies, the management of the systems, regardless of the mapping issues, has been a major consideration.
Ingest has been a major concern, with the need to develop a system that can handle a variety of input formatsfrom ASCII text files to highly structured database output. The architecture must also accommodate the character sets of the incoming sources. This is particularly important if a mark-up language has been used to represent special characters and diacritical marks. Since many digital library systems are being built as extensions or applications of existing integrated library systems ILS , it is important to consider how the KOSs will integrate with the library system.
Unfortunately, many ILS vendors have not considered links to external files or databases in their system designs. In some cases, the vendor may require that the information be stored in the proprietary format of the ILS. The system may require that the files be on the same directory or server as the accessing ILS. The fields that can be linked to the Web or searched may be limited. Outside communications may require Z With relatively closed systems, ILSs may be a difficult environment in which to implement alternative and nontraditional KOSs. Digital libraries that are interested in using KOSs should consider this integration when developing requirements for the procurement of a system to support them.
Vendors should be encouraged to support relatively open architectures and to consider the extension of traditional library systems to support broader digital library functionality. In addition to these immediate concerns, it is important to consider the incorporation of future KOSs. Initial success may spur the desire for integration of additional KOSs or enhanced functionality for the existing KOS.
Success may breed additional requirements and increase the strain on hardware, software, and network architectures. For a digital library, an outdated KOS can be more of a hindrance than a benefit. Maintenance, both of content and of the system, should be considered when planning a KOS. This is particularly important if the digital library is to be self-supporting or revenue generating.
Version control of the KOS is extremely important. If there has been significant transformation or processing of the original KOS, it may be difficult, or impossible, to reload the original and recreate the changes that have been made. A transaction-based approach, whereby only changes are transferred between the KOS provider and the library, is also possible; however, this requires that the system provider have the infrastructure, both machine and human, to produce these transactions. It also requires that the changes to the original KOS be identifiable in order to create change transactions.
However, the changes are often not indicated with enough detail to support automatic change transactions in the UMLS. If a change date, for example, is recorded only at the level of the concept record, it is impossible to tell whether the term has changed a correction of a typographic error for example or if the relationship between this concept and another concept has changed.
Section 3 lists three collaborative applications that are typical of those observed in wide-area file systems. Section 4 lists several issues that affect file-system based collaboration. Section 5 proposes several new file system services, taken from digital library technology, that add missing information management operations. Section 6 summarizes efforts in this area. Section 7 concludes. AFS, the Andrew File System, serves as an example of the facilities typically available in a wide-area file system [ Howard88 , Spector89 ].
Using a set of trusted servers, AFS presents to clients a location-transparent, hierarchical name space. This means that a user operates with a common directory structure whether accessing files from his Unix workstation in Pittsburgh or the personal computer in his satellite office in Tokyo. An AFS volume consists of a set of files and directories located on one server and forms a partial subtree of the shared name space [ Sidebotham86 ].
The distribution of volumes across servers is an administrative decision. To balance the load among a collection of servers, an administrator can migrate busy volumes from one server to another. Volumes that are frequently read but rarely modified such as system binaries may have read-only replicas at multiple servers to enhance availability and to distribute server load. Since the name of a file does not depend on the server where it is stored, volume migration and replication improve availability and reduce server load without changes to the user's view of the file system.
Figure 1. An example of the global name space shared by AFS users. AFS uses an aggressive file caching policy to reduce the network load and access latency [ Kazar88 ]. When a user accesses a file, the wide-area file system first checks the local disk cache for a copy of the file. With a typical file access, a user has a "working set" of files that remains consistent for a period of time.
Solutions to Challenges Facing a University Digital Library and Press
Security in a wide-area file system is founded on an authentication mechanism and secure RPC between servers and clients. While all participating sites have to agree on the common protection and authorization model, each site has full control in implementing individual security policies. AFS uses access control lists for protection. An access control list is a set of pairs; the first item in each pair is a name of a user or a group, and the second is the information regarding the rights granted to that user or a group. Users are allowed to create new groups and also to specify negative rights.
This authorization model allows fine grain specification of access control rights for every user and every part of the wide-area file system.
- Modeling the Interplay Between Human Behavior and the Spread of Infectious Diseases?
- RELATED BOOKS.
- David Rivett: fighter for Australian science.
- Digital Libraries.
- A digital library framework for biodiversity information systems!
- CliffsQuickReview accounting principles II!
- My Side.
For performance reasons the granularity of protection is an entire directory rather than individual files. AFS supports multiple administrative cells, each with its own servers, clients, system administrators and users. Each cell is a completely autonomous environment. But a federation of cells can cooperate in presenting users with a uniform, seamless file name space. For example, Figure 1 shows a fictitious name space for a federation of three organizations. At the time of writing this paper, more than organizations around the world are part of the publicly accessible AFS wide-area distributed file system and many others participate in corporate federations.
This section describes common file sharing activities among customers of wide-area file system technology. These examples demonstrate three kinds of collaboration that wide-area file system technology facilitates: collaborative administration, focused collaboration, and dissemination of shared information. The task is truly daunting, if the requirement is that a user can sit at any machine anywhere in the internationally distributed organization and operate in the same application and data environment.
One way to ensure uniformity is to install all software on the disk attached to each computer. Tools to replicate files and disk images simplify this task for small organizations. However, for large distributed enterprises the administrative overhead of ensuring consistency among all systems is significant, and there will always be periods where there are inconsistencies between machines that violate the uniformity requirement.
In contrast, an enterprise-wide software repository stored in a wide-area file system provides a single point of administration, and instant access to new software configurations. Rather than install software on the local disk of each computer, the package is installed in the shared file system.
The file system cache on each machine notices the change, flushes the old version, and loads the new. At no time is the uniformity requirement is violated. While a shared software repository simplifies the problem of deploying software configurations, it does not help administrators decide what should be contained in the repository. Dependencies between packages -- e. Once the decision is made to remove a package, the administrator must identify the various pieces of the package.
This is complicated by the common approach to installation where files are placed in several different directories; i. Third-party installation utilities solve some of these problems through an application specific "Uninstall" facility. The software development environment at Transarc exemplifies properties of this form of collaboration. Employees from development, system test, documentation, training, and product support groups interact through the file system to design, develop, package, and maintain several software applications.
The collection of shared files includes source code, product documentation, training manuals, design notes, software defects, and many other kinds of files. Figure 2. Relationships between files used to fix a software defect. To understand the complexity of interactions in this environment, consider the actions that occur when a customer reports a problem with a product, as shown in Figure 2. First, the support specialist who handles the call creates a defect report and begins to search through product documentation, release notes, and other defect reports for similar problems.
Since many of the repositories of information available to the support specialist are not designed for finding solutions for software defects, locating relevant documents is a complex and time consuming process. If there are no documents that describe solutions to the defect, the support specialist might post a request to a bulletin board asking for help.
- Parish Priest: Father Michael McGivney and American Catholicism.
- Child Psychopathology, Third Edition?
- Inside the Business of Graphic Design: 60 Leaders Share Their Secrets of Success.
As a last resort, the defect is handed to the development organization for investigation. The developer who receives a defect report retrieves the source files from which the customer's software was created. However, since the files that contain source for a product are constantly being revised as defects are fixed and features are added, the files used to build a particular product must be reconstructed. To manage these changes, Transarc uses a version control system that tracks changes to individual source files.
Since changes to a single file often leave the system in an inconsistent state, the version control system identifies batches of changes that take the collection of source files from one stable state to another. To fix the defect, the developer examines the defect report and any explanatory material that is associated with it, initiates a change in the state of the product source code, fixes the problem, and indicates that the new state is stable. Since developers frequently change the same files in parallel, it is necessary to merge stable states.
The developer responsible for the merge examines the changes, initiates a state change, resolves the conflicts, and marks the state as stable. At this point, the newly created stable state is given to the system test group and, if it passes the regression tests, to product engineering where a patch is created. The patch is added to the repository of patches and the original product support specialist is notified.
When the customer finally retrieves the patch, the defect report is marked "closed". The participants in this process make extensive use of the shared file system. All relevant documents are stored as files including bulletin board posts. Developers use a personal "sandbox" to modify files so that stable versions of the source code are isolated. System test and product engineering receive references to stable configurations of the source code. Handbooks, earnings statements, expense reports, presentations, minutes of meetings, white papers, and many other pieces of information represent a large corpus of data with diverse formats, access restrictions, and distribution characteristics.
Electronic mail is one way to provide this information to employees. As push technology, electronic mail is useful as a notification agent.
However, several benefits come from archiving corporate information in a wide-area file system. First, keeping a single copy of each file in the file system is more efficient than keeping a copy on each machine.
Second, file system administrators ensure that backups are performed regularly. Most users, however, do not backup files consistently. Finally, access controls placed on files ensure that confidential information is protected from unauthorized access. As with backups, file system administrators can create drop-off locations for files that require special protection so that users need not worry about writing the correct access controls. The presence of a universally accessible shared file system simplifies document publishing especially when the HTTP server is configured to pull documents directly from the shared file system and allows some of the load for accessing documents to be off-loaded to the file system.
In particular, requests for files can be handled efficiently by the file system [ Spasojevic94 ]. The HTTP server continues to process requests for dynamic documents such as the output of CGI scripts and files merged with server-side includes. However, there are limitations in the file system interface that affect the construction and integration of tools for collaboration.
In particular, file system technology focuses on robust and efficient storage and archival of files, but provides very few facilities for handling content. This section lists several issues commonly faced by those building and using tools for collaboration. As a result, deployed file systems are typically very large.
For example, the employees of Transarc access gigabytes of shared files from several offices in the United States and two international offices. The collection of files at Transarc is one of collections available through the public AFS name space. During studies three years ago when the number of participating organizations was 80 the total amount of file system data available through AFS was measured to be approximately 5 terabytes [ Spasojevic96 ]. Scale introduces many system-level problems -- e. However, user-level problems remain unaddressed.
Digital library - Wikipedia
In particular, finding files is a key limitation to collaboration. Name space manipulation -- placing a file in a particular directory -- is the only method that the file system provides to simplify file location. While this approach works well for managing small collections, there are limitations to its usefulness for organizing large collections. First, the name space in an enterprise-wide file system is vast.
User directories at Transarc start four directories deep. Product development source trees frequently reach depths in excess of ten directories. In this environment, finding files through interactive browsing is extremely difficult without additional information. Second, since a file can be placed only in a single directory, its location in the name space can represent only one aspect of the file. This restriction frequently causes problems when collections of related files must be placed in separate directories to accommodate existing hierarchies, as is the case when installing many software packages.
Documentation is placed in one directory, binaries in another, shared libraries in a third, and configuration files in a fourth.