You will develop a short 1300 word document.
Task: Describe how OSINT can be used to supplement your organizational collection plan, identify 10 sites that can be used to research sites/domains for:
What is Metadata? …………………………………………………………………………………….. 1
What Does Metadata Do? …………………………………………………………………….. 1
Structuring Metadata ……………………………………………………………………… 2
Metadata Schemes and Element Sets ……………………………………….. 3 Dublin Core ………………………………………………………………………………………………………3
TEI and METS………………………………………………………………………………………………..4 MODS ………………………………………………………………………………………………………..5
EAD and LOM…………………………………………………………………………………………6 <indecs>, ONIX, CDWA, and VRA …………………………………………………………7
MPEG …………………………………………………………………………………………….8 FGDC and DDI …………………………………………………………………………….9
Creating Metadata ………………………………………… 10
Interoperability and Exchange of Metadata ….11
Future Directions ……………………………… 12
More Information on Metadata …….. 13
Glossary ……………………………….. 15
Acknowledgements Understanding Metadata is a revision and expansion of Metadata Made Simpler: A guide for libraries published by NISO Press in 2001. NISO Press extends its thanks and appreciation to Rebecca Guenther and Jacqueline Radebaugh, staff members in the Library of Congress Network Development and MARC Standards Office, for sharing their expertise and contributing to this publication.
About NISO NISO, a non-profit association accredited by the American National Standards Institute (ANSI), identifies, develops, maintains, and publishes technical standards to manage information in our changing and ever-more digital environment. NISO standards apply both traditional and new technologies to the full range of information-related needs, including retrieval, re-purposing, storage, metadata, and preservation. NISO Standards, information about NISO’s activities and membership are featured on the NISO website <http://www.niso.org>.
This booklet is available for free on the NISO website (www.niso.org) and in hardcopy from NISO Press.
Published by: NISO Press National Information Standards Organization 4733 Bethesda Avenue, Suite 300 Bethesda, MD 20814 USA Email: [email protected] Tel: 301-654-2512 Fax: 301-654-1721 URL: www.niso.org
Copyright © 2004 National Information Standards Organization ISBN: 1-880124-62-9
What Is Metadata? Metadata is structured infor-
mation that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource. Metadata is often called data about data or information about information.
The term metadata is used differently in different communities. Some use it to refer to machine understandable information, while others use it only for records that describe electronic resources. In the library environment, metadata is commonly used for any formal scheme of resource description, applying to any type of object, digital or non-digital. Traditional library cataloging is a form of metadata; MARC 21 and the rule sets used with it, such as AACR2, are metadata standards. Other metadata schemes have been developed to describe various types of textual and non-textual objects including published books, electronic documents, archival finding aids, art objects, educational and training materials, and scientific datasets.
There are three main types of metadata: • Descriptive metadata describes
a resource for purposes such as discovery and identification. It can include elements such as title, abstract, author, and keywords.
• Structural metadata indicates how compound objects are put together, for example, how pages are ordered to form chapters.
• Administrative metadata pro- vides information to help manage a resource, such as when and how it was created, file type and other technical information, and who can access it. There are several subsets of
Metadata is key to ensuring that
resources will survive and
continue to be accessible into
administrative data; two that sometimes are listed as separate metadata types are:
− Rights management meta- data, which deals with intellectual property rights, and
− Preservation metadata, which contains information needed to archive and preserve a resource.
Metadata can describe re- sources at any level of aggregation. It can describe a collection, a single resource, or a component part of a larger resource (for example, a photograph in an article). Just as
catalogers make decisions about whether a catalog record should be created for a whole set of volumes or for each particular volume in the set, so the metadata creator makes similar decisions. Metadata can also be used for description at any level of the information model laid out in the IFLA (International Federation of Library Associations and Institutions) Functional Require- ments for Bibliographic Records: work, expression, manifestation, or item. For example, a metadata record could describe a report, a particular edition of the report, or a specific copy of that edition of the report.
Metadata can be embedded in a digital object or it can be stored separately. Metadata is often embedded in HTML documents and
in the headers of image files. Storing metadata with the object it describes ensures the metadata will not be lost, obviates problems of linking between data and metadata, and helps ensure that the metadata and object will be updated together. However, it is impossible to embed metadata in some types of objects (for example, artifacts). Also, storing metadata separately can simplify the management of the metadata itself and facilitate search and retrieval. Therefore, metadata is commonly stored in a database system and linked to the objects described.
What Does Metadata Do?
An important reason for creating descriptive metadata is to facilitate discovery of relevant information. In addition to resource discovery, metadata can help organize electronic resources, facilitate interoperability and legacy resource integration, provide digital identification, and support archiving and preservation.
Resource Discovery Metadata serves the same
functions in resource discovery as good cataloging does by: • allowing resources to be found
by relevant criteria;
• identifying resources;
• bringing similar resources together;
• distinguishing dissimilar re- sources; and
• giving location information.
Organizing Electronic Resources
As the number of Web-based resources grows exponentially, aggregate sites or portals are increasingly useful in organizing
Page Understanding Metadata2
l inks to resources based on audience or topic. Such lists can be built as static webpages, with the names and locations of the resources “hardcoded” in the HTML. However, it is more efficient and increasingly more common to build these pages dynamically from metadata stored in databases. Various software tools can be used to automatically extract and reformat the information for Web applications.
Interoperability Describing a resource with
metadata allows it to be understood by both humans and machines in ways that promote interoperability. Interoperability is the ability of multiple systems with different hardware and software platforms, data structures, and interfaces to exchange data with minimal loss of content and functionality. Using defined metadata schemes, shared transfer protocols, and crosswalks between schemes, resources across the network can be searched more seamlessly.
Two approaches to inter- operability are cross-system search and metadata harvesting. The Z39.50 protocol is commonly used for cross-system search. Z39.50 implementers do not share metadata but map their own search capabilities to a common set of search attributes. A contrasting approach taken by the Open Archives Initiative is for all data providers to translate their native metadata to a common core set of elements and expose this for harvesting. A search service provider then gathers the metadata into a consistent central index to allow cross-repository searching regardless of the metadata formats used by participating repositories.
Digital Identification Most metadata schemes include
elements such as standard numbers to uniquely identify the work or object to which the metadata refers. The location of a
digital object may also be given using a file name, URL (Uniform Resource Locator), or some more persistent identifier such as a PURL (Persistent URL) or DOI (Digital Object Identifier). Persistent identifiers are preferred because object locations often change, making the standard URL (and therefore the metadata record) invalid. In addition to the actual elements that point to the object, the metadata can be combined to act as a set of identifying data, differentiating one object from another for validation purposes.
Archiving and Preservation
Most current metadata efforts center around the discovery of recently created resources. However, there is a growing concern that digital resources will not survive in usable form into the future. Digital information is fragile; it can be corrupted or altered, intentionally or unintentionally. It may become unusable as storage media and hardware and software technologies change. Format migration and perhaps emulation of current hardware and software behavior in future hardware and software platforms are strategies for overcoming these challenges.
Metadata is key to ensuring that resources will survive and continue to be accessible into the future. Archiving and preservation require special elements to track the lineage of a digital object (where it came from and how it has changed over time), to detail its physical characteristics, and to document its behavior in order to emulate it on future technologies.
Many organizations inter- nationally have worked on defining metadata schemes for digital preservation, including the National Library of Australia, the British Cedars Project (CURL Exemplars in Digital Archives), and a joint Working Group of OCLC and the Research Libraries Group (RLG).
The latter group developed a framework outlining types of presentation metadata. A follow-up group, PREMIS (PREservation Metadata: Implementation Strat- egies)—also sponsored by OCLC and RLG—is developing a set of core elements and strategies for the encoding, storage, and manage- ment of preservation metadata within a digital preservation system. Many of these initiatives are based on or compatible with the ISO Reference Model for an Open Archival Information System (OAIS).
Structuring Metadata Metadata schemes (also called
schema) are sets of metadata elements designed for a specific purpose, such as describing a particular type of information resource. The definition or meaning of the elements themselves is known as the semantics of the scheme. The values given to metadata elements are the content. Metadata schemes generally specify names of elements and their semantics. Optionally, they may specify content rules for how content must be formulated (for example, how to identify the main title), representation rules for content (for example, capitalization rules), and allowable content values (for example, terms must be used from a specified controlled vocabulary).
There may also be syntax rules for how the elements and their content should be encoded. A metadata scheme with no prescribed syntax rules is called syntax independent. Metadata can be encoded in any definable syntax. Many current metadata schemes use SGML (Standard Generalized Mark-up Language) or XML (Extensible Mark-up Language). XML, developed by the World Wide Web Consortium (W3C), is an extended form of HTML that allows for locally defined tag sets and the easy exchange of structured
Dublin Core Example
Description=”Presents an overview of metadata conventions in publishing.”
Publisher=”The Sheridan Press”
Identifier=”http://www.niso.org/ standards/resources/ Metadata_Demystified.pdf”
information. SGML is a superset of both HTML and XML and allows for the richest mark-up of a document. Useful XML tools are becoming widely available as XML plays an increasingly crucial role in the exchange of a variety of data on the Web.
Metadata Schemes and Element Sets
Many different metadata schemes are being developed in a variety of user environments and disciplines. Some of the most common ones are discussed in this section.
Dublin Core The Dublin Core Metadata
Element Set arose from discussions at a 1995 workshop sponsored by OCLC and the National Center for Supercomputing Applications (NCSA). As the workshop was held in Dublin, Ohio, the element set was named the Dublin Core. The continuing development of the Dublin Core and related spec- ifications is managed by the Dublin Core Metadata Initiative (DCMI).
The original objective of the Dublin Core was to define a set of elements that could be used by authors to describe their own Web resources. Faced with a pro- liferation of electronic resources and the inability of the library profession to catalog all these resources, the goal was to define a few elements and some simple rules that could be applied by noncatalogers. The original 13 core elements were later increased to 15: Title, Creator, Subject, Descrip- tion, Publisher, Contributor, Date, Type, Format, Identifier, Source, Language, Relation, Coverage, and Rights.
The Dublin Core was developed to be simple and concise, and to describe Web-based documents. However, Dublin Core has been used with other types of materials and in applications demanding
some complexity. There has historically been some tension between supporters of a minimalist view, who emphasize the need to keep the elements to a minimum and the semantics and syntax simple, and supporters of a structuralist view who argue for finer semantic distinctions and more extensibility for particular communities.
These discussions have led to a distinction between qualified and unqualified (or simple) Dublin Core. Qualifiers can be used to refine (narrow the scope of) an element, or to identify the encoding scheme used in repre- senting an element value. The element Date, for example, can be used with the refinement qualifier created to narrow the meaning of the element to the date the object was created. Date can also be used with an encoding scheme qualifier to identify the format in which the date is recorded, for example, following the ISO 8601 standard for representing date and time.
All Dublin Core elements are optional and all are repeatable. The elements may be presented in any order. While the Dublin Core description recommends the use of controlled values for fields where they are appropriate (for example, controlled vocabularies for the Subject field), this is not required. However, working groups have been established to discuss authoritative lists for certain elements such as Resource Type. While Dublin Core leaves content rules to the particular imple- mentation, the DCMI encourages the adoption of application profiles (domain-specific rules) for particular domains such as education and government. An application profile
for libraries is being developed by the Libraries Working Group.
Because of its simplicity, the Dublin Core element set is now used by many outside the library c o m m u n i t y — r e s e a r c h e r s , museum curators, and music collectors to name only a few. There are hundreds of projects worldwide that use the Dublin Core either for cataloging or to collect data from the Internet; more than 50 of these have links on the DCMI website. The subjects range from cultural heritage and art to math and physics. Meanwhile the Dublin Core Metadata Initiative has expanded beyond simply maintaining the Dublin Core Metadata Element Set into an organization that describes itself as “dedicated to promoting the widespread adoption of inter- operable metadata standards and developing specialized metadata vocabularies for discovery systems.”
Page Understanding Metadata4
The Text Encoding Initiative (TEI)
The Text Encoding Initiative is an international project to develop guidelines for marking up electronic texts such as novels, plays, and poetry, primarily to support research in the humanities. In addition to specifying how to encode the text of a work, the TEI Guidelines for Electronic Text Encoding and Interchange also specify a header portion, embedded in the resource, that consists of metadata about the work. The TEI header, like the rest of the TEI, is defined as an SGML DTD (Document Type Definition)— a set of tags and rules defined in SGML syntax that describe the structure and elements of a document. This SGML mark-up becomes part of the electronic resource itself. Since the TEI DTD is rather large and complicated in order to apply to a vast range of texts and uses, a simpler subset of the DTD, known as TEI Lite, is commonly used in libraries.
It is assumed that TEI-encoded texts are electronic versions of printed texts. Therefore the TEI Header can be used to record bibliographic information about both the electronic version of the text and about the non-electronic source version. The basic bibliographic information is similar to that recorded in library cataloging and can be mapped to and from MARC. However, there are also elements defined to record details about how the text was transcribed and edited, how mark-up was performed, what revisions were made, and other non-bibliographic facts. Libraries tend to use TEI headers when they have collections of SGML-encoded full text. Some libraries use TEI headers to derive MARC records for their catalogs, while others use MARC records as the basis for creating TEI header descriptions for the source texts.
Metadata Encoding and Transmission Standard (METS)
The Metadata Encoding and Transmission Standard (METS) was developed to fill the need for a standard data structure for describing complex digital library objects. METS is an XML Schema for creating XML document instances that express the structure of digital l ibrary objects, the associated descriptive and administrative metadata, and the names and locations of the files that comprise the digital object.
The metadata nec- essary for successful management and use of digital objects is both more extensive than and different from the metadata used for managing collections of printed works and other physical materials. Structural metadata is needed to ensure that separately digitized files (for example, different pages of a digitized book) are structured appro- priately. Technical metadata is needed for information about the digitization process so that scholars may determine how accurate a reflection of the original the digital version provides. Other technical metadata is required for internal purposes in order to periodically refresh and migrate the data, ensuring the durability of valuable resources.
METS was originally an outgrowth of the Making of America II project, a digitization project of major research libraries that attempted to address these metadata issues, in part by providing
an encoding format for metadata for textual and image-based works. The Digital Library Federation (DLF) built on that earlier work to create METS, a standard schema for providing a method for expressing and packaging together descriptive, administrative, and structural metadata for objects within a digital library. Expressed using the XML schema language, METS provides a document format for encoding the metadata necessary for manage- ment of digital library objects within a repository and for exchange between repositories.
Metadata in Action An oral historian makes tape-
recordings of interviews with members of a particular ethnic group. Interviewees sign a paper release form giving intellectual property rights to the historian. Most interviewees grant permission to disseminate the interviews in print and electronically, but several restrict publication and dissemination until 25 years after death.
Information about each interview is kept in a database: Interviewer, Interviewee, Date, Place, etc. Each interview follows a questionnaire format. The questionnaire exists as a text file. The tapes, release forms, database, and text file are donated to a library that has a special collection focusing on the particular ethnic group.
The tapes are digitized. Since each interview runs over several tapes, technicians record structural metadata to keep component parts of each interview together. Technicians record administrative metadata such as file names, location of each interview in the files, equipment used, the methods of digitizing and assuring quality and completeness, file formats, etc. Different segments of this metadata allow the audio files to be automatically tracked, accessed, stored, refreshed, and migrated.
An archivist expands the database to include the persistent identifier of each interview, thereby linking the audio file to the descriptive metadata. The names of the data elements are revised to match Dublin Core terminology, including qualifiers used specifically for audio
(continued on page 5)
PageUnderstanding Metadata 5
A METS document contains seven major sections: • METS Header – Contains
metadata describing the METS document itself, including such information as creator, editor, etc.
• Descriptive Metadata – Points to descriptive metadata external to the METS document (for example, a MARC record in an OPAC or an Encoded Archival Description finding aid main- tained on a webserver), or to internally embedded descriptive metadata, or both.
• Administrative Metadata – Provides information regarding how the files are created and stored, intellectual property rights, the original source object from which the digital library object derives, and the prov- enance of the files comprising the digital library object.
• File Section – Lists all files containing content that comprise the electronic versions of the digital object.
• Structural Map – Outlines a hierarchical structure for the digital library object and links the
elements of that structure to content files and metadata that pertain to each element.
• Structural Links – Allows METS creators to record the nodes in the hierarchy outlined in the Structural Map.
• Behavior – Associates executable behaviors with content in the METS object.
The METS header, file section, structural map, structural l inks, and behavior sections are defined within the METS schema. METS is less prescriptive about descriptive and admin- istrative metadata, relying on extension schemas— externally developed metadata schemes—to provide specific elements. The METS Editorial Board has endorsed three descriptive metadata schemes: simple Dublin Core, MARCXML, and MODS (discussed below).
For technical metadata the METS website makes available schemas for text and digital still images. The latter standard is
called MIX, Metadata for Images in XML Schema, and is based on a proposed NISO standard, Z39.87, Data Dictionary: Technical Metadata for Digital Still Images. Further work is in process on extension schemas for audio, video, and websites. Another current area of concentration for the METS development community is the creation of METS application profiles to give guidance regarding the creation of METS documents for particular object types.
Use of the METS schema is widespread. A list of implementation registries using METS, a tutorial, and other important information can be found on the METS website.
Metadata Object Description Schema (MODS)
The Metadata Object Description Schema (MODS) is a descriptive metadata schema that is a derivative of MARC 21 and intended to either carry selected data from existing MARC 21 records or enable the creation of original resource description records. It includes a subset of MARC fields and uses language- based tags rather than the numeric ones used in MARC 21 records. In some cases, it regroups elements from the MARC 21 bibliographic format. Like METS, MODS is expressed using the XML schema language.
Although the MODS standard can stand on its own, it may also complement other metadata formats. Because of its flexibility and use of XML, MODS may potentially be used as a Z39.50 Next Generation specified format, an extension schema to METS, a metadata set for harvesting, and for creating original resource metadata records in an XML syntax.
Rich description of electronic resources is a particular focus of MODS, which provides some advantages over other metadata
Metadata in Action (continued from page 4)
materials. Information on rights and permissions is entered.
An archivist creates an EAD finding aid for the audio collection using the database as the core. Portions of the questionnaire text file are incorporated as a rich source of subject keywords. A MARC record is derived from the EAD finding aid and added to OCLC and RLIN.
A webpage is created where researchers can access the finding aid, search the database, and listen to the audio files. Interviews coded as restricted are invisible to the search program until the date when they become open to the public. Administrative, structural, and descriptive metadata is created for the webpage to hold all the pieces together, allow them to be managed, and allow them to be accessed.
The library participates in a metadata harvesting protocol to provide extracts of local metadata in a common format to a service provider so that information about the collection is automatically included in a number of relevant tools such as catalogs and portals.
The webpage is linked to the library’s website dedicated to resources about the ethnic group, where it is available to researchers in context with archival and visual materials, digitized secondary sources, etc. Administrative, structural, and descriptive metadata at the website level has also been created.
Page Understanding Metadata
A MODS Record Example <mods>
<titleInfo> <title>Metadata demystified</title>
</titleInfo> <name type=”personal”>
<namePart type=”family”>Brand</namePart> <namePart type=”given”>Amy</namePart> <role>
<roleTerm authority=”marcrelator” type=”text”>author</roleTerm> </role>
</name> <typeOfResource>text</typeOfResource> <originInfo>
<placeTerm type=”text”>Bethesda, MD</placeTerm> </place> <publisher>NISO Press</publisher>
</originInfo> <identifier type=”isbn”>1-880124-59-9</identifier>
schemes. MODS elements are richer than the Dublin Core; its elements are more compatible with library data than the ONIX or Dublin Core standards; and it is simpler to apply than the full MARC 21 bibliographic format. With its use of XML Schema language, MODS offers enhancements over MARC 21, such as the use of an optional ID attribute to facilitate linking at the element level; the ability to specify language, script, and transliteration scheme at the element level; and the ability to embed a rich description of components in the related Item element.
The ability in MODS to give granular descriptions of constituent parts of an object works particularly well with the METS structural map for complex digital library objects.
The Encoded Archival Description (EAD)
We are a professional custom writing website. If you have searched a question and bumped into our website just know you are in the right place to get help in your coursework.
Yes. We have posted over our previous orders to display our experience. Since we have done this question before, we can also do it for you. To make sure we do it perfectly, please fill our Order Form. Filling the order form correctly will assist our team in referencing, specifications and future communication.
2. Fill in your paper’s requirements in the "PAPER INFORMATION" section and click “PRICE CALCULATION” at the bottom to calculate your order price.
3. Fill in your paper’s academic level, deadline and the required number of pages from the drop-down menus.
4. Click “FINAL STEP” to enter your registration details and get an account with us for record keeping and then, click on “PROCEED TO CHECKOUT” at the bottom of the page.
5. From there, the payment sections will show, follow the guided payment process and your order will be available for our writing team to work on it.
Need this assignment or any other paper?
Click here and claim 25% off
Discount code SAVE25