Skip to content


Latest Additions

You are here: Home > Specifications > Reviews > Review March 2004

IESR Metadata Review March 2004

Ann Apps


Summary: Status of reviewed metadata

Metadata Changes in Version 2.0

  • dc:description changes to dcterms:abstract for collection and service (1.1)
  • rslpcd:hasPublication changes to dcterms:isReferencedBy for collection (1.1)
  • Add iesr:logo for service (5.4)
  • Change Agent metadata properties as in section 6
  • Add OpenGIS as access method (8.1.1). But will need some further investigation to define interface details
  • Add ftp as access method (8.1.4).
  • usesControlledList will be optional (8.3.1)
  • Collection access list will be free text (8.7)
  • Add UDC as a subject scheme. (Already used by suppliers)
  • Add LCSH as a spatial scheme. (Stakeholder request)

Metadata Proposals for Next Review

  • Add dcterms:extent for collection (1.2)
  • hasDescription / isDescriptionOf and hasPart for collection (2.1)
  • Different descriptions for audiences (4.1)
  • Format of items of collection (4.2.1)
  • Service subject, correlation of transactional service interfaces, output property for more access methods, service type list (5.2,3,5,6)
  • Audience in admin metadata (7.4)
  • Add rsync as access method (8.1.5)
  • Dewey as a mandatory subject scheme (8.2)
  • Collection type scheme (8.4)
  • Other dates (8.5)
  • Other audience types (8.10)
  • Futher authentication requirements, athens resource name, athens prefix (8.9.1,2,4)
  • Include local identifiers for entities (Stakeholder request)
  • Investigate alternative representation for webcgi interface details
  • Definition of iesr:logo beyond simply: it's value is a URL that will yield a graphic on a web page

1. Dublin Core Collection Description Application Profile

[Decision: change some properties now, as indicated. Others will not be changed.]

We should change IESR metadata to become inline with DC CD AP as far as possible
We have a window of opportunity now because IESR metadata is not yet in use apart from data supply. We can still accept the old terms for data supply (there will be a 1-1 correspondence).

We will not make any further changes to the metadata, apart from additions.

Pro: use of Open Standards rather than becoming increasingly proprietary
Con: not all of DC CD AP is agreed yet. In particular namespace decisions have not yet been made. Thus we can change only parts.

1.1 Properties to change

If we decide to make changes these are candidates:

[Decision: change 1 and 3, leave 2 as is.]

Existing Property Proposed Property
dc:descriptiondcterms:abstract
rslpcd:hasAssociationdc:relation
rslpcd:hasPublicationdcterms:isReferencedBy

1.2 Property to add?

[Decision: defer until next review.]

Should we add dcterms:extent in addition to dc:format?

This would make us compatible with DC CD AP. But have we had any request for it?

1.3 Other Properties that could change

[Decision: no change.]

The proposed namspace cld is not yet defined, nor its status clear, so it is probably not a good idea to change to terms in cld. Affects:

Existing Property cld Property
rslpcd:contentsDateRangecld:contentsDateRange
rslpcd:ownercld:owner
iesr:logocld:logo

Note: this is mainly about metadata made public from IESR. So it would be a good idea to be consistent with DC CD AP. But we cannot make this change until the DC Usage Board makes decisions about the namespace - have they done so yet?

It is probably best to leave 'logo' in iesr namespace because it's definition for cld/gen is not yet decided.

1.4 Other Properties in rslpcd and iesr

[Decision: no change.]

Other rslpcd terms currently used. These are all for service and so beyond DC CD scope at present. I suggest we leave these as rslpcd. I assume the rslpcd namespace will still be valid even when DC CD AP is agreed. (Conclusions from the workshop seemed to be that an updated RSLP was needed of which DC CD AP is a subset.)

  • rslpcd:locator
  • rslpcd:seeAlso
  • rslpcd:administrator

Other iesr terms used. I suggest these should all remain as-is. I think hasService for a collection is more appropriate to our application than the proposed isAvailableAt.

  • iesr:hasService
  • iesr:useRights
  • iesr:usesControlledList
  • iesr:interface
  • iesr:output
  • iesr:supportsStandard

2. RSLP Terms not included in IESR

2.1 Review after Pilot

[Decision: defer until next review.]

We left the decision about these for after the pilot. Is there any reason to include them? Has there been any stakeholder request?

This is really a question of the data model. Do we need to model the distinction between a Catalogue (a Collection of metadata records) and the Collection of items described by those metadata records? Those two collections may be made available by different Services with different access conditions; and the Collections themselves may have different conditions of use etc.

We need to look at real world examples to explore this and make any decision. I suggest we defer until next metadata review.

  • hasDescription
  • isDescriptionOf
  • hasPart

2.2 Other RSLPCD Properties

[Decision: no change - these terms will not be included.]

The following terms were not included. I've seen no request or reason to use them yet. I suggest we ignore them from now on unless we get any specific requests.

  • accumulationDateRange
  • strength
  • accrualStatus
  • custodialHistory
  • objectName (use subject)
  • agentName (use subject)
  • Collector
  • note
  • agentHistory
  • isLocationOf

3. Records in IESR

3.1 Requirements for Entities in IESR

[Decision: no change - this is already implemented.]

Entity descriptions are supplied separately. Thus rules are needed to decide which entities to compose into IESR records for indexing in the registry. The following rules are currently enforced by the registration and record composition software.

  • Every collection description in the IESR must be linked to at least one service description in the IESR.
  • Every hasService, owner, administrator must be registered in IESR for a collection to be indexed in IESR.
  • A Service in IESR that has no collection hasService pointing to it is presumed to be a transactional service. (Note we do ask suppliers to indicate this service type).

3.2 Identifiers

[Decision: no change - this is already implemented.]

Global identifiers for collections, services and agents are assigned on registration by IESR.

Data suppliers using the initial templates need to use local identifiers to link their entities together. These are recorded in the IESR Meta-Registry but are not used in IESR. Referenced entities must have an IESR identifier to satisfy the requirements above.

Identifiers are of the form: http://purl.org/poi/iesr.ac.uk/<time>-<pid> where <time> is seconds since 1970 and <pid> is process ID at registration time. Is this still OK, given that it uses iesr.ac.uk?

4. Collection Metadata

4.1 Different Audiences

[Decision: defer until next review.]

Should we include the possibility of have several descriptions for different audiences? I suggest we defer this for now. It needs some thought about how to design it. It could be a future enhancement.

4.2 Suggested Additional properties

[Decision: 1 - defer until next review
2 and 3 - drop as out of scope of IESR.]

The following properties have been suggested. Should we include any of them?

  • 1. format of the items of a collection.
    PJ comment: I think this is a useful property for discovery/selection so probably should be included, but (as in DC CD AP) we need to find the right way of modelling/expressing this, as it is not the dc:format of the collection - it's the dc:format of an item.
  • 2. current funded life time of a service or project
    PJ comment: I think this refers to service as in "service provider" = Administrator of Service. i.e. it's a property of the Administrator, or maybe strictly speaking it's a property of the relation between Administrator and Service. Urgh, I think it's hard to model correctly. We could just fudge the detail and put a text field in the Service description, I suppose.
    AA comment: I suggest we drop this!
  • 3. usesIdentifier (akin to usesControlledList) for standard identifiers, eg for a service that uses ISSNs. [Request from EDINA]
    PJ comment: this should be taken care of by the proposed metadata schema registry

4.3 Problems from Stakeholders

[Decision: drop - this is a problem only in the short term.]

hasService. Currently the collection record needs updating when add a new service. This was particularly seen as a problem where the supplier of the service differed from the supplier of the collection. (Note currently the registration software warns about this but doesn't prohibit it.) This isn't really a long-term problem - the proposed metadata editor should be able to deal with this.

5. Service Metadata

[Decision: No change - 1 (already implemented), 7
Change / add new property - 4
Defer until next review - 2, 3, 5, 6
No change - 8 and 9

  • 1. description. A collection-based service may have a description if it extends the description of the collection with information particular to that service. The service description is searchable for all services, not just transactional ones. This has already been implemented.

  • 2. dc:subject. Should we include this for a transactional service? I suggest we defer this for the next review after some use of IESR.

  • 3. Should different interfaces to a transactional service be correlated in some way? They are interfaces to the same functionality. In the current model they will apear as entirely separate services.

  • 4. service logo. Should we include?. This is just a URL that yields a graphic on a web page.

  • 5. output. Output is currently applicable to webcgi and openurl services only and is not searchable. This means you cannot currently search for eg. a service providing MARC records (unless this information is in the collection or service 'description'. For service types like Z39.50 this is currently hidden within a Zeerex file, which is for information not discovery. Should we have a searchable output field for all services? [Request from EDINA]

  • 6. dc:type/SvcTypeList. Has this been used? It is probably too soon to look into this. These service types were mainly for transactional services.
    We thought there would be a requirement to search for an OpenURL Resolver (ie and SFX-like one not a link-to reslover - we may need a better term). But that is deducible as access-method=openurl and transactional service.

  • 7. supportsStandard. There is overlap between access method from AccMthdList and StdsList. The reason (1) we did this was to keep the access method list clean and simple for resource discovery. It soon becomes horrendously detailed if we include all the versions and profiles. I guess most searches would be for the simple access method - a portal could then look at supportsStandard if it wants more information. The other reason (2) for keeping the StdsList separate is that a service can have only one access method (a restriction that seems right) but it could support more than one version/profile of a standard. Eg. an openurl service could support both version 0.1 OpenURL and version 1.0/SAP1.

  • 8. accessRights. Are we following the DC definition? Should we reconsider? ("Information about who can access the resource or an indication of its security status.")

  • 9. Should we change our basic model so that a collection has locations, and a location has services. Does location matter for a digital collection? - generally users don't care where a collection is. This model seems more appropriate for physical collections where location would matter.
    I suggest that changing the model now would require far too much discussion and changes. Or we could simply understand that that is what we are modelling, but that we have conflated location and service for pragmatic reasons.

6. Agent Metadata

[Decision: change as detailed.]

  • Agent Role. What is this for? We need to keep it now because people like filling it in. It should not refer to any particular collection or service.

Proposal to ditch vcard. We need to capture an organisation (or individual) with a few essential details (and probably not contact names). Suggested:

dc:titleName of organisation or personwas vcard:ORG/FN mandatorynot repeatablesearchable
dc:identifierGlobal identifier  mandatorynot repeatablesearchable
dc:descriptionDescription of organisationwas vcard:ROLE optionalnot repeatable 
iesr:emailContact email addresswas vcard:EMAIL mandatory (for administrator)not repeatable 
iesr:phoneContact phone numberwas vcard:TEL optionalnot repeatable 
dc:relation with http URIAgent URLwas vcard:URL optionalnot repeatable 
iesr:logoLogo  optionalnot repeatable 

Example:

<iesr:Agent>
<dc:title>MIMAS</dc:title>
<dc:identifier xsi:type="dcterms:URI">http://purl.org/poi/iesr.ac.uk/123456789-98765</dc:identifier>
<dc:description>MIMAS is a data centre...</dc:description>
<iesr:email>info@mimas.ac.uk</iesr:email>
<iesr:phone>tel:00441612756019</iesr:phone>
<dc:relation xsi:type="dcterms:URI">http://www.mimas.ac.uk</dc:relation>
<iesr:logo>http://www.mimas.ac.uk/images/someimg.gif</iesr:logo>
</iesr:Agent>

7. Administrative Metadata

[Decision: no change (already implemented) - 1, 2, 3
Defer until next review - 4.]

  • 1. Has a mandatory rights field, though can be omitted for data supply. Contributors agree to a Creative Commons statement (licence for attribution required, non-commercial use, share-alike) for all the metadata records they supply to IESR.
  • 2. dc:publisher. Has been dropped - IESR is the publisher of the metadata.
  • 3. dcterms:modified. Set automatically on registration or update of a record. Thus any instances of this field in supplied data are ignored.
  • 4. audience. Should we include (to support descriptions for different audiences, eg HE/FE)?

8. Controlled Vocabularies

8.1 Service Access Methods

[Decision: change / add term - 1 (but needs investigation), 4
Defer until next review - 5
No change / don't add - 2, 3.]

The following have been suggested for addition by stakeholders:

I propose we add:

  • 1. OpenGIS. Needed for a web map server service. I think this is just another webcgi standardised access method, probably with an interface (like WSDL or zeerex) and supportsStandard entry (ie. like OpenURL)

I propose we don't add:

  • 2. Xquery over SOAP - is this different from just SOAP?
  • 3. OAI over sets - does this need more coverage than just OAI-PMH?
  • 4. ftp
  • 5. rsync

8.2 Subject Schemes

[Decision: No change - At least one subject term, of any scheme, is mandatory and enforced by software
No change - Provision of at least one Dewey term is mandatory, but enforcement is procedural
Defer until next review - use of Dewey as required 'backbone' scheme]

Should at least one subject term be mandatory?
This is currently enforced by software checking. Some records have been supplied with no subject. I've added one just to load the data. I think we should require at least one 'local' term.

Should there be at least one subject term according to a prescribed scheme?
Yes if we want to provide useful "Select Collections by Subject" functionality
Should this scheme be Dewey? There are licence problems with Dewey.
Should we use JACS instead?

How many schemes should we include? Should we include more at data suppliers' requests?
Currently we have: DDC, HASSET, JACS, LCSH, MESH, UNESCO
Should we reduce this list? Is anyone using MESH?
Should we remove HASSET and ask suppliers who've used it to re-catalogue their terms? Note: HASSET was added at request of UK Data Archive. It is used extensively in UK HE Social Science. It is based on UNESCO.

Is JACS an appropriate scheme for subject. It is about course codes. Should it be an encoding scheme for an 'eudcationSomething' property?

8.3 usesControlledList

[Decision: change - 1 make usesControlledList optional
No change - 2 and 3.]

1. Should at least one usesControlledList be mandatory?
This was a HILT requirement.
The problem is that a lot of collections do not use a recognised controlled vocabulary. I've added to the Controlled Vocabularies List:

  • LOCAL - for data suppliers to indicate they use only a local vocabulary
  • NONE - this property was not included in supplied record

This doesn't seem terribly useful. Should we make the property optional?

2. Should the list of vocabularies be extended at data supplier' requests?
It seems reasonable to include any quality schemes they use. Different schemes are used in different subject domains.

3. Should vocabulary scheme versions be included?
In my opinion 'no' - this is too fine grain detail for IESR.

8.4 Type

[Decision: defer until next review.]

The only scheme we have for collection type is CLDT. Is this OK? Do we need an IESR scheme? [Note: IESR adds DCMIType Collection or Service]

8.5 Dates

[Decision: defer until next review - wait for DC-Date proposals.]

B.C.E, geological, approximate, questionable.
I suggest we wait for DC-Date Working Group to make proposals on this. Until then we use some ad hoc guidelines.

8.6 hasAssociation

[Decision: drop - not really an issue.]

Should this be constrained to URL? Or to URI within IESR?
[If the latter, we had a query about how to find this URI - this functionality will be available by software.]

8.7 Access List

[Decision: change to free text.]

This list was ad hoc and may be getting out of hand.
There is potential for mismatches in the long strings using the current templates. But it would be a select list in a future metadata editor.
Would it be better made repeated and more granular so that data suppliers could select multiple components?
Or is it best just to make it free text?
I prefer the latter. Will it be used for any more than just information?

8.8 Access Control List

[Decision: no change - 1 (but better defintions)
Drop - out of scope or not an issue - 2.]

1. This list currently uses very brief tokens.
Should these be longer, more explicit strings? Or is a brief token with a longer definition (I will improve these) sufficient?
Should 'none' be 'open'?

2. It may be too simplistic for a multi-client service. Eg with a single athens access point, then further restrictions depending on user type or ip. This is an EDINA request (DigiMap). Is IESR meant to go into that level of detail? In my opinion this example should be denoted as 'athens' because that is the initial service access point.

8.9 Authentication

[Decision: defer until next review - 1, 2, 4
out of scope - 3.]

1. We need to investigate requirements for further description of authentication requirements. I suggest we defer this to the next review. We may need to support Shibboleth but it is too soon yet to identify requirements.

2. Request to record Athens resource name (which is not linked to any other name in service) for services with AccessRights=athens. Is there really a need for this information in IESR? Shouldn't a portal ask Athens for this? We may need to ask EDINA for more details as to what use they expect to make of this. Is it one-to-one with service (or collection)? Thus is it an identifier (xsi:type="iesrAthensResourceName") or is it accessRights?

3. Request to record IP ranges that are automatically allowed access for services with AccessRights=ip. This seems to be outside the scope of IESR. It could involve huge maintenance tasks for someone. It should be dealt with by the resource itself.

4. Should we record Athens prefix as an institution identifier for a service, eg for an OpenURL resolver. We previously had this in, then discarded it in favour of DNS Domain. But is there are one-to-one mapping between the two? The OpenURL router is using Athens prefix as institution identifier.

8.10 Audience types

[Decision: defer until next review.]

Currently we have only education levels, and are using educationLevel (a sub-property of audience). Should we look at including audience types beyond educational ones, eg look at publishers' lists, MARC lists?

9. Stakeholder Requirements

[Decision: needs discussion - may result in metadata changes.]

Outstanding stakeholder requirements need to be considered. Decisions should be made on:

  • Those to be taken forward into the future development
  • Those that require further investigation or discussion with the stakeholders that made the request
  • Those that are out-of-scope or not sensible to implement