However, the technological and economic context associated with the third is in too great a state of flux for legislation to be appropriate at this stage. For example, one deposited copy of a valuable electronic publication might be made available to a wide population by network; this obviously would be a concern to its publisher.

There are also financial considerations. Whereas the marginal cost of depositing one copy of a printed publication is generally negligible, the costs associated with depositing an electronic item may be substantial. There may be costs resulting from the need to provide documentation, and from the removal of copy protection for example.

Consequently a repository can be expected to seek a trade-off between immediate access and long term availability. Currently, our discussions centre on the levels of public access to deposited items. Given the extreme positions of 1 consultation at one terminal only and 2 unlimited and free remote access, The Library's working party has chosen a compromise.

The current proposals are considering:. There are two fundamentally different approaches, namely 1 continual migration to new media and formats and 2 provision of original support environments by hardware and software emulation. One particularly complex area is that of online databases.

Long-Term Preservation of Digital Documents: Principles and Practices

As examples of the complexity, we can point out that online databases exist in a multitude of formats; they are constantly and rapidly being changed; and they are made available in a number of forms which may or may not correspond to our understanding of "publishing" as described above. This makes them too challenging to be considered for legal deposit today, though one day all should be taken within its scope.

It is already too late to preserve some publications, yet it is too early for legal deposit to be practical. We shall have to wait until the economics of this form of publishing are less turbulent and better understood. The need for a national infrastructure of digital archives is argued. Critical issues operating environment, migration strategies, intellectual property and finances are examined. The paper ends with a summary of the draft report's recommendations.

By way of introduction, it is interesting to reflect on some of these benefits, lest we take them too much for granted. The entire collection was recently migrated from another computer in about one hour. It is routinely backed up in a quarter of that time. These are performance levels which we simply cannot hope to emulate with paper or non-digital formats. Benefits such as this are inherent in digital formats; they will allow us to perform some functions more easily than before, and others which we could not previously perform at all..

The task force issued a draft report on 25th August for comment. This draft will be used as a basis for this presentation. These are defined below. Digital Archives "Repositories of digital information that are responsible for storing and ensuring, through the exercise of various migration strategies, the long term accessibility of the nation's social, economic, cultural and intellectual heritage instantiated in digital form. Note that this definition distinguishes digital archives from libraries. Whereas libraries have access as a main objective, archives' priorities centre around storage and preservation.

The Task Force adopted this definition instead of the concept of "refreshing" which had been used in its original terms of reference, because refreshing was felt to be insufficient in scope. Digital Preservation "Retaining the ability to display, retrieve, manipulate and use the digital information in the face of constantly changing technology.

This should include a number of recognised repositories. Recognition would be achieved by certification of an independent authority. A fail-safe mechanism would be needed, for example to "rescue" data if an archive closes. Other mechanisms will be needed, for example to direct data producers who find themselves unable to maintain particular data sets, and for archives to proactively seek out data sets in danger of being "orphaned". The operating environment will be conditioned by the diversity of attributes which describe it. Archives will have to contend with:.

The Task Force takes the view that owners, creators and copyright holders have the initial responsibility for archiving their data sets. This is not to say that they will look after every object which they should care for; they represent only a first line of defence. Pressure should be applied in this area, particularly on publishers. Certified archives will have a right and the responsibility to exercise aggressive rescues of endangered data sets. Migration Strategies for Digital Information Migration strategies will have to be developed.

The nature of the these strategies will depend on the relevant application environments, on the formats involved, and on the degrees of functionality sought. Migration will take into account the need to change media, change formats, and in some cases incorporate standards. It is quite possible that this will be achieved by specialist processing centres or bureaux, possibly consortium-owned. One action which would greatly ease the task of migration would be the incorporation of migration paths into new software.

Intellectual Property The relevant legislation is the US Copyright Act Section , which protects intellectual property while allowing libraries to make copies of protected material for preservation purposes. However, this legislation did not anticipate the need to copy digital documents for the same purpose; consequently legislative changes will be called for before data can be migrated with impunity. Similarly, other preservation-related activities may require the permission of copyright holders, under present rules. These activities are:.

Clearly, this would be an onerous responsibility. Requiring owner authorisation for each preservation action could undermine the effectiveness of an archives network. The Task Force therefore proposes that digital archives would not be required to seek authorisation to create a copy or to store, migrate and manage that copy. Intellectual property owners would retain control over the making of new copies in other circumstances, however. We also propose that any work which is not protected intellectual property can be accessed, used and disseminated according to the terms of the archive; but for any work which is thus protected, the actions would require agreement of the rights holder.

The questions of cost, and relative cost, are complex. A large number of interrelated factors contribute to the cost of digital archiving, namely the costs of:. Cost models therefore must include consideration of the functions which are included. They must also allow for predictions of change in technologies and costs over long time periods. Trends caused by the archives themselves will also affect the costs: for example, once operations are routine and predictable, we can anticipate that unit costs would greatly decrease; and there may be some economies of scale.

The costs of an archive must of course be matched by its revenues. A model will need to allow for potential income from tax and accounting incentives, user fees and subscriptions. At this point, the recommendations include the following:. The final report will then be issued, and we look forward to the recommendations being implemented, hopefully in To subscribe to the listserv send the message subscribe archtf-l to listserv yalevm. First: what is a preservation policy for digital material? Second: does it differ from a preservation policy for "conventional" library and archive material and, if so, in what way?

It examines many detailed issues related to these two fundamental questions, including the influences of other factors collection purpose, format, medium etc on preservation policy. The Oxford English Dictionary puts it more succinctly as the art of "keeping safe", "keeping alive", "maintaining" and "retaining". In a digital context, we have to look anew at this definition, as an extra dimension has to be taken into account.

According to the OED, policy is "a course or general plan of action". In other words, a preservation policy at its most basic is a plan of action for safe keeping. Such a plan of action should address the questions of what needs to be preserved, why, for what purpose, and for how long. In order to address these questions we shall have to look at the function and purpose of the collections themselves, and of those of the institutions in which they are kept.

For example:. It is not necessary to spell out for this audience that although the answers to the questions of what, why and for how long differ with the aims and purpose of the institution or the collection in question, they are also influenced by the nature of the material itself. If we consider a broadly-based international collection, comprising original sources and secondary material, basic research material and ephemera, we will encounter a wide variety of formats. These can include:.

Different formats and different media demand different technical solutions as well as different storage conditions, but the aim and the purpose of a library itself and its functions determine its preservation policy which should cover all formats and all media. This policy then steers the preservation programme, which sets out the order in which collections or items will be preserved and the method by which this should be done. Relationship Between Purpose and Preservation Needs If we talk about the purpose of a collection as a determining element in its preservation needs, we need to look at other library and archive functions that are closely linked to preservation, such as acquisition, retention and access.

How strong these links are and what their relative importance is depends on the purpose of the library or archive in question, as well as on the nature of the material. Although the aims and purposes of the various kinds of libraries and archives vary enormously they all have some basic objectives in common. All libraries and archives acquire material or have at one stage in their existence done so , mostly with the aim of making it available at some time or other; and all want to retain some of it for a shorter or longer period of time, some in perpetuity.

If we assume that all research libraries want to make their collections available for use now or in the future, they will have to ensure that those collections can be used and are in a fit state to be used. For conventional materials the human body provides its own access mechanism. Moreover this is renewed with each new generation. For digital material this is not the case. Eyes alone are not much use when faced with any of the formats or media in which digital data is presented. Unless we have a separate usable and maintainable access mechanism, we simply cannot use the acquisition data.

The question of what to preserve is answered in part by the reason why an item is acquired. If it is acquired in order to serve a community of undergraduates for one or at the most two years, there may be reasons for acquiring multiple copies, but little reason for preserving the copies once they have served their very limited purpose.

If on the other hand an item is acquired for permanent addition to and retention in a collection, its preservation becomes as important as its acquisition. Short term use may still call for a short term conservation fix; it does not call for a controlled long term preservation policy. Selection If we consider a national deposit library as a library of "last resort" for publications which otherwise may disappear, and as a place where the entire "published archive" of a nation is kept together and is recorded, the principles of selection and acquisition of material are the same whether we talk about conventional or electronic formats.

The way in which these formats are selected and acquired will vary. As a matter of principle, all publications, whether conventional or electronic should at least be considered for acquisition in a deposit collection. In practice selectivity is forced upon us by constraints in resources, storage space, handling capacity and funding. Selectivity may also be influenced by technical capacity. It has been said that the selection of electronic publications should be limited to those that can be acquired, handled and stored locally by the library.

However, in a digital environment one could equally well argue that giving access to publications that reside elsewhere also fulfils one of the major purposes of any library, namely to make information available to its users although it is not a deposit function.

Dynamic documents such as frequently-updated online databases pose an acquisition problem that we do not face with conventional texts.

Webinar - Requirements for Long Term Preservation or Archiving

Although one may argue for selective acquisition that is frequent enough to preserve all information contained in such a publication during its lifetime, prohibitive costs may well compel a much greater selectivity aimed at only acquiring representative samples however difficult it may be to decide what is representative. Format and Medium The format in which the information is presented should not influence its selection or non-selection, as a format that cannot be easily handled may be converted to one that the library or archive can handle.

This may be problematic, but it should be attempted; time and effort should be spent to achieve it. Nor should the medium be regarded as a selection criterion. Here again, the information content may be transferred to another medium that can be accommodated. Selection criteria relating to the intrinsic value or importance of the material to be acquired will be the same for conventional and for electronic material.

In libraries where maximum access of most up-to-date material is the prime objective, selection criteria may well be guided by medium or format. The decision of whether or not an item will be retained needs to be made, as well as the decision of whether an item needs to be retained in its original format or in surrogate form. In many cases, the format is as important as, sometimes more important than, the information it contains. Format alone can provide information over and above its contents and there are library and archive users who have a real need to consult the material in its original format.

For many users a surrogate will suffice and can at times be preferred. The decision whether to retain the original once a surrogate has been made is not clear cut. The main reason is the lack of longevity of the storage media for electronic information, coupled with the imminent obsolescence of their retrieval hardware and software. Simply "leaving things as they are" is not an option for digital collections. The choice whether to retain the document as an artefact, or to retain the information it contains, or both, is less of a real choice with electronic material.

If we try and keep electronic publications as artefacts i. On the other hand, if we attempt to retain the content, many aspects of the visual presentation and perhaps even of the "functionality" of the electronic document will be lost. We may also lose what Peter Graham has called the "integrity and authenticity of the information as originally recorded". Experience so far seems to indicate that in the long run the intellectual content of an electronic publication is all we can retain and we shall have to accept at least for the time being that certain interactive dynamic and presentational aspects of the original cannot be retained.

In parallel with conventional publications, the off-line digital publication as a physical object is itself an expression of a part of our culture. It could therefore be argued that we must try to retain at least a representative sample of such physical objects and of their retrieval mechanism, in the knowledge that once the latter have broken down or can no longer be replaced, we will end up not as a functioning library or archive but as a museum of dead digital dodos. Many libraries and archives take the amount of use that is made of their collections as an indication of their preservation needs.

One can argue that the nature and purpose of the use, rather than the amount of use an item may get or is expected to get, is of paramount importance when making retention and preservation decisions. To give low use as a reason for neglect or non-preservation is dangerous. Some material may not be in immediate demand nor in frequent demand, but it may be needed by someone at some stage to increase knowledge or improve understanding. If we believe this, then the model proposed for digital preservation by Donald Waters as the "just-in-time" model versus the just-in-case model of conventional preservation is one that should be used only in awareness of its limitations.

The increasing tendency in some parts of the library world away from collections in favour of access reduces the chances of our long term ability to fulfil the research needs of future generations. Nevertheless, the question of why an item or a collection should be preserved is closely linked to considerations of use and considerations of access.

Only if we want to create a time capsule is there any point in preserving material to which access is withheld and even then, a time capsule is only of value if people know what it contains or if it is opened one day. They have the duty to make their collections available to those who need to use them, now and in many cases also in the future. Providing access to the collections while preserving them for future use can, at least for conventional material, be seen as two conflicting aims.

There are indeed kinds of access that defeat or prevent future use, in the same way as there are preservation methods that inhibit instant access. Nevertheless, such conflicts can be resolved and if the need for, and the purpose of, access are considered carefully, the dilemma between access and preservation is not quite so acute.

Per contra, for digital material we can argue that access can assist preservation. While not in itself sufficient, a high level of systematic access helps to check the usability of electronic publications. The kind of use, the kind of access that is needed, influences preservation decisions and preservation methods.

It has already been pointed out that with electronic material we may not have the choice to preserve both content and physical integrity. We do, however, have the choice whether to preserve electronic documents in digital format, on-line or off-line, and whether we "convert" them for the purpose of long term retention to non-electronic media.

These choices will to some extent be steered by the medium and format of the publication, but also by the type of access that is needed. In many cases electronic publications cannot be preserved as originally received, whether this is because the medium will not survive, or because the technical environment becomes obsolete, or for intrinsic reasons for instance networked publications by definition cannot be acquired and stored in their original medium so have to be converted to another.

If access is needed to the content only, irrespective of any other functional considerations, the cowardly way out may be to convert from electronic media to paper or microfilm. However, such a strategy may only be valid for publications which are not true electronic documents but are just non-interactive static documents distributed on an electronic medium.

For dynamic, interactive documents and multimedia, such conversion is not an option. If we want to preserve publications as electronic publications there are two basic options for their archival storage, either off-line storing them as physical objects, or on-line, on a database.

These options provide a different kind of access. On-line storage implies on-line access, and a reference in a catalogue will give an on- line storage location, allowing direct access to the publication. If distributed or networked access is necessary, the on-line storage option will be preferable. Having discussed what to preserve and why, the vexed question of how to preserve may well be asked.

This has not been covered here, partly because the author does not feel qualified to do so, at least not for digital material, and partly because the "how" is not really part of a preservation policy. Although human intellect, human understanding, historical and technical knowledge, common sense, energy and a will to succeed are all vital, no preservation policy, no preservation programme, however well conceived, stands a chance of being implemented without sufficient funding.

But preservation is only one of many library and archive functions that cry out for funding. In order to find a proper balance between the funding of preservation and other functions, we must again consider how they are related. Historically, libraries have looked at the balance of funding between acquisitions and preservation, between access and preservation, and sometimes between public services and preservation. In recent decades the balance of funding between computing and telecommunication services and preservation has also been considered.

However, when we talk about the preservation of electronic material, the latter distinction may well disappear. Lack of resources has always stood in the way of the successful implementation of a preservation policy or strategy and will certainly do so no less for electronic material.

Perhaps the situation is even worse. At least once one has conserved a book, one can be reasonably satisfied of its continued existence provided the item is properly stored and not over-handled. Similarly, once one has made a microfilm, provided the film and its production methods are of archival quality and it is stored in the right conditions, the contents of a book or manuscript will be preserved for about years.

However, this is not the case with electronic material. Long term access to such material requires an ongoing commitment to reformat, refresh or migrate data, and only if libraries and archives are willing to commit long term funding and long term effort should they embark on the acquisition and maintenance of electronic collections. To do otherwise is irresponsible. Planning for long term preservation of electronic material is made even more difficult because of the rapid changes in technology and the impossibility of predicting what the state of technology will be, even in the medium term.

The collections form a library and archive's most valuable and most important asset, and the provision of access to those collections their most important duty. The argument has been presented in the past that in an electronic environment a library will become an information broker, an institution that does not own the data but simply enables access to them.

Technology will help. It will continue to improve and to become more and more useful and affordable. The answer may lie in the human ingenuity to develop and use it, but we must also endeavour to make the best possible use of the available resources; we must ensure that we do not duplicate efforts; we must combine to work together, to share the responsibility for preserving our cultural heritage, and we must be selective, in the full knowledge that selectivity is almost certain to damage future research.

It is therefore the more important to be selective in the context of a national or international preservation strategy. It seems fitting to close with words from Northrop Frye: "Society, like the individual, becomes senile in proportion as it loses its continuous memory". In an electronic age these words are not merely a warning, they are a threat. It presents two main options on-line and off-line storage , and relates these to The British Library's needs. Finally, estimated costs are stated. The study examined the issues which surround the preservation of digital materials. It started with a literature review, then moved on to a review of the preservation process, developed a statement of objectives, reviewed the preservation options, and considered the resource requirements.

Though few would disagree with this basic idea, in the electronic age it does require understanding of the scope of the term of "published". It is proposed that in this context we consider:. Accessioning Log receipt; assign accession number; check documentation; count copies; check permissions; check media; forward copies to deposit libraries; pass on. First Handling Check media; send out; virus check; read documentation; load data; run tests; repeat for copies; check keys to usage restrictions; download data; technical notes; pass on.

Record Creation Link accession record, publication, documentation, documentation, notes; view and inspect; create bibliographic record and profile; record storage location of data and documentation. Initial Preservation Label publication; store data online and back-up, or download and store off-line and record location.

Some of these tasks are familiar to libraries from the handling of books. Others are unique to digital material; of these some eg checking documentation can represent enormous levels of human effort. Although this approach is followed at the Library of Congress to an extent, it is not practical for very high volumes. It also raises problem of media, security and standards. Ensure Data is Not Lost This can be taken to mean that the data is preserved for "digital archaeologists" of future generations to decode.

This would mean that no effort is made to make the data accessible or usable for immediate or medium term access. The platters would be stored in controlled conditions, and the data would be refreshed by copying to new platters or other media every ten years. This approach presumes that some issues can be overcome. For example, some data cannot be easily copied from some media eg some existing CD-ROMs ; and current CD-R technology does not automatically perform read-after-write checking and so something needs to be done to ensure the integrity of CD-R copies.

In the absence of complete, issue-free solutions to the problems, the challenge is to start managing the data now, in the assumption that the answers for long term preservation will emerge naturally. Ensure Data Can be Interpreted in Future There are three ways in which we can manage data to make sure that future generations will be able to make use of it:.

The cost includes an allocation of the costs of providing PC workstations for access. For digital publications which are "similar" to paper documents, the pragmatic approach will be to convert all unformatted text into ASCII format for preservation; and to convert formatted text to a portable, platform-independent format such as Adobe's PDF Acrobat format. Ideally, a completely open format will be adopted. Both these are to be avoided as much as possible. Clearly, the state of the technological art in the field of digital preservation means that we have to tread very carefully when taking long term decisions.

The high costs and risks point to the need to be very selective in preserving digital works. Copying of some to paper or microfiche may remain the most desirable option. The description or definition of "strategy" in this context was the subject of some discussion, and there was no attempt to develop a formal, complete definition. The principal conclusion concerned the importance of establishing and maintaining a momentum. Notwithstanding the relevance of adopting a strategy, it was felt that some actions should be initiated as soon as possible, so that there is not an inordinate delay while a thorough strategy is produced.

The higher education sector has its own needs; it will have to create its own solutions, rather than relying entirely on other institutions such as The British Library. Two divergent views were represented in the group, namely:. The limited time available restricted discussion to Management and Resource issues. There was considerable debate on the meaning of the term "publishing", with a conclusion that the debate is more of concern to national libraries than to the higher education sector.

Last updated on Apr Email comments to webmaster ukoln. Facilitate Debate Take actions to facilitate debate on the issues surrounding preservation. This should encourage a move towards long term commitment to the idea of preserving digital materials. The vehicles for such debate should include existing organisations such as CURL.

Sensitise IT Suppliers Take actions to make leading members of the IT supply side aware of preservation issues, eg in relation to legal deposit. Develop Guidelines Stimulate the development of "permissive guidelines" for digital preservation, that is guidelines which suggest approaches but which are not overly limiting or prescriptive. This can include the identification of responsibilities and the identification of gaps. A wide range of stakeholders should be involved. Influence Research Funders Make major research funders aware of digital preservation issues.

Encourage them to develop digital archiving policies, as those in place at the ESRC; suggest that preservation considerations be required in all funding proposals.

Long-Term Preservation of Digital Documents

