Research Data Australia has been set up to register collections. Related parties, activities and services in Research Data Australia provide context and meaning for the collections. The collections registered are most commonly datasets, but they can also be of "collection" type ("compiled content created as separate and independent works"), such as museum or archive collections; information collections, such as registries, catalogues and indexes; or aggregated collections, such as are found in repositories.
Research Data Australia is a collections registry; as long as the entity you are intending to describe can be understood as a research collection, it can be described within Research Data Australia. That means what is being described: is an aggregation of resources, and will be understood as a single aggregation of resources within its research context; are not exclusively documents as the output of research, although they can certainly be documents as the subject matter of research; has Australian relevance, either through involvement of Australian researchers, or Australian subject matter.
A collection may be described as a self-standing entity, or it may be related to other collections. The most common relations between collections are hierarchical, for example, where a collection is derived from another collection, or part of another collection. Both these relations are supported within Research Data Australia. Lateral relations between collections, such as "these collections have subject matter in common", "these collections are part of the same larger collection", "these collections have come out of the same research activity", and "these collections have the same primary collector" are not required in Research Data Australia. That is because faceted displays support this type of relation between collections, and the information is already captured in the system through the description of related objects. Step 13 explains relations between objects in Research Data Australia.
Collection records created to best practice standards will:
The steps below discuss issues that are specific to collections. See Collection for general information.
Collection type is required. There are seven collection types, catalogueOrIndex, classificationScheme, collection, dataset, registry, repository and software.
Keys are required and must be unique in Research Data Australia. The key identifies the collection metadata record in Research Data Australia.
Collection names/titles should be as descriptive as possible. They should include keywords to provide context for non-specialist users, as well as information such as the nature of the data and spatial and temporal coverage. For example, a collection named "Pilbara" may be adequate in the context of a particular discipline database, but not in the general context of Research Data Australia. It would be more informative to provide a name like "Western Australian Geological Survey: Pilbara" or "Aboriginal Art Collection: Pilbara, 1950-1965".
Good quality collection descriptions will increase the chances of a collection being discoverable through search engines, as well as helping researchers decide if the collection is likely to be useful for them. The following principles are recommended:
At least one description of rights, licences and/or access rights for a collection is required in a collection record. Rights information supports the re-use of collections. A hyperlink can replace a rights, licence or access rights description.
Rights statement: a statement about the rights held in a collection. These may be intellectual property rights such as copyright or moral rights.
Licence: a legal statement giving official permission to do something with a collection. Use this element to describe the type of licence that applies to the data.
Access rights: a statement about access rights and access constraints for the collection, including who may access and when access may occur (including any embargo). Restrictions may be based on security, privacy or other policies. Preferably, also choose a type from the accessRights type vocabulary.
This is a Digital Object Identifier minted by ANDS for a collection in Research Data Australia. See this record in Research Data Australia
Other examples (fictional)
Display of multiple collection records in Research Data Australia
Where two or more collection records, from the same or different data sources, share common identifiers, the records are treated as describing the same collection.
In Research Data Australia, the records are merged into a single search result and links to each of the merged records are displayed on the view page of each record.
This feature of Research Data Australia is described in detail in Step 8 of Best practice for creating party records. The description and examples on this page apply equally to multiple collection records.
Note: “local” identifiers are not used to link multiple records together.
Use the dates element to to record dates in a collection's data management lifecycle. Use of this element provides additional options for discovery and access to data collections by date. It is particularly useful where citationMetadata element is not supplied or date attribute is missing from the data collection description.
Choose from a number of date types to provide one or more instances of the element to describe events such as date created, date submitted or date published. A single date or date range can be provided.
Note: do not confuse dates (collections) with other date elements
Collection location enables users to access the collection. This may mean direct access to the collection or mediated access via a contact person or organisation. Appropriate locations may include an electronic address or a physical address. Electronic addresses are preferred over physical addresses. Spatial location is less commonly used, since electronic and physical addresses fulfil the purpose of enabling access more directly.
Note: do not confuse spatial location with spatial coverage.
An appropriate electronic address for a collection is a URI to a landing page in a repository and/or an email address of a person or organisation which can respond to enquiries about access. The electronic address may also lead directly to download of the collection. A physical address for the researcher's office/research centre might also be appropriate, particularly if access is mediated.
Spatial location describes where a collection is physically located, using geospatial coordinates such as latitude and longitude. This may be useful for physical collections such as museums and archives.
Currency of location information (date ranges)
Only use "Date From" or "Date to" for collection location information if you need to describe a period of time during which the location information was current. Date ranges should only be used where the address has changed and older addresses have been recorded in the metadata being provided. More information about date range
Temporal and spatial coverage describe the locations in space and time to which collections relate: they are the location or time that something is about, not the location or time where something is.
Note: do not confuse spatial coverage with spatial location.
Collection records are connected to activity, (other) collection, party and service records by including the keys for those other records in the relatedObject element.
In this example, the key for a party is included in the collection's relatedObject element. The meaning here is "this collection is managed by this party".
Use the primary relationship opt-in function in the Data Source Account configuration page to link all records within your data source to a party record for your organisation. This will allow all your organisation's collections to be discoverable via the party record describing your organisation.
ANDS infers and displays bi-directional links between related objects in Research Data Australia. If a collection links to a party within the same data source, the party record does not need to link back to the collection; ANDS will display the inferred reverse link in Research Data Australia. If the party and collection are from different data sources, ANDS will only display the inferred reverse link if the receiving partner has opted in to allow bi-directional links.
For manually supplied records, ANDS requires partners to provide links in both directions, to familiarise themselves with the link structures involved.
A collection must be related to at least one party. This is to allow discovery of the collection through the parties responsible for it, and to provide a contact point for queries from users.
If multiple parties within the institution have made a substantial contribution to the collection, the collection is related to all those parties.
Party records can be sourced through Trove's People and Organisations zone or by linking to existing records in Research Data Australia, for external collaborators who have made a substantial contribution to the collection. If so, those records are also included as related parties.
Hierarchical relations between collections are important to describe if they provide essential context for a collection. Lateral relations between collections are not usually necessary.
A collection should be related to an activity if it is the output of a well-defined funded project. If the data collection is the output of a research project you have described as an activity record, then relate the data collection to the activity record using, the key off the activity record and the relation "isOutputOf".
If the collection is the output of an ARC or NHMRC grant, use the relatedInfo element (see Step 15).
A collection should be related to a creation service if it is produced by an instrument or software, and other collections are also likely to be produced by the same instrument or software. This is so that collections produced through the same service can be related for discovery.
A collection should be related to a discovery service if it is exposed for discovery via a particular machine protocol, other than the keyword search found by default in portals.
Provide a subject to allow Research Data Australia to associate a collection with a research field, and, indirectly, with other collections in the same field.
The subject represents the primary topic or topics covered by the collection.
If you provide any subjects, you must provide a subject from the Australian and New Zealand Standard Research Classification (ANZSRC) 2008. This is used as a common subject vocabulary across Research Data Australia. ANZSRC "Field of Research (FOR)" codes should be used whenever available. ANZSRC "Type of Activity (TOA)" and/or "Socio-economic Objective (SEO)" codes may also be used.
Preferably, terms from other vocabularies (e.g. LCSH, MeSH) and/or local subjects (keywords) should be used in addition to the ANZSRC codes. Under RIF-CS v1.3.0, Linked Data URLs for vocabulary terms should be provided where available.
Include links to related information which provides research context for understanding the collection. Related information types are "publication", "website", "reuseInformation", "dataQualityInformation" and "metadata". In addition, the implementation of RIF-CS v1.5.0 (November 2013) expanded the intended usage of the relatedInfo element to include linking to “activities”, “collections”, “parties” and “services”.
An example of related information is a publication resulting from the data in the collection. Include the identifier and title of the publication. Also include relation, description and URL elements, introduced in RIF-CS v. 1.5.0.
Another example is where the collection is the output of a research grant already in RDA. The identifier for the grant is the PURL identifier (e.g. http://purl.org/au-research/grants/arc/DP130101968) and the relation is "isOutputOf"
If a collection is published, the citation for the published collection should be included.
Citations support the re-use of and long-term access to collections. The full citation gives a dataset citation in a single full text string, while citation metadata gives a dataset citation split into machine-readable component parts that referencing software can import from Research Data Australia.
Citations for collections should include a URI or other resolvable identifier (e.g. DOI).
|1 Feb 2012||New, separate and expanded Collection Best Practice page|
|2 Nov 2012||Added "metadata" as a type in Step 13: Related Info|
|20 Nov 2012||Added dates (collections) as Step 14|
|9 May 2013||Reordered steps to align with Release 10 interface changes|
|26 November 2013||Updated Related Info to include information about the element according to RIF-CS v1.5.0|
|28 March 2014||Added information at Step 9, about the display of multiple collection records in Research Data Australia in Release 12|
|15 May 2014||Modified contents; modified Step 3 to include information about what best practice means|
|31 July 2015||Updated information|
The purpose of a party record in Research Data Australia is to support discovery of research data collections and to provide context to those collections.
ANDS collaborated with the National Library of Australia to provide infrastructure for describing parties using the Trove service. All ANDS partners can use this infrastructure to create party records which can be harvested into Research Data Australia from Trove. See Trove and TIM and the ARDC Party Infrastructure for detail.
At least one party record must be related to each collection described in Research Data Australia (see Metadata Content Requirements).
If a record is already available for your party you do not need to create a new party record. Link to the existing record by including either the Research Data Australia party record key or the NLA Trove party identifier in the related object element of your records.
Povide your own description of the person or group in a separate party record. You cannot edit other partners' records, including the National Library of Australia’s (NLA) records.
See Step 8 for how to make sure that both your record and the existing record are treated as describing the same party.
Note: Most partners will need to contribute party records, because many researchers may not have records yet, or the records that do exist may be inadequate.
External researchers including international researchers
ANDS has altered its position on creating party records for researchers external to your research institution. Organisations can now create party records for Australian and international researchers external to an organisation or independent of any organisation, where these records do not already exist. The organisation should perform a search for such existing records before proceeding to creation (see Step 2). If a new party record is created, the record-creating organisation needs to take responsibility for including the name of the employing institution, if available.
ANDS suggests that you work with collaborating institutions to ensure that all researchers responsible for the generation of collections are appropriately acknowledged by having party records in Research Data Australia. The ARDC Party Infrastructure Project is intended to help partners access party records for external researchers.
At least one party should be related to a collection (see Metadata Content Requirements). If multiple parties can be related to a collection, the description should aim to link the collection to any party that will improve discovery substantively.
That means all active known collaborators on the project
As a default, that would include all named researchers on the research grant application, but not support staff or research assistants.
Organisations are hierarchically organised, and you may have a choice of hierarchical level at which to represent your party (group). For example, you may link your collection to your research lab, to your department, your faculty, or your research institution. Which level of group is represented as responsible for collections is a matter of institutional policy.
Remember that Research Data Australia is not intended to represent group hierarchies. The default approach should be to represent the lowest-level group, with the most direct engagement with the collection. That means that the research lab or individual researchers are the parties of interest that need to be described and linked to a collection, in preference to departments or faculties. The name of a research lab, for example, may include its superior body's name as part of its name, e.g. Budawang University, Frontiers of Chemistry Research Laboratory.
Research Data Australia groups an institution's information for display using the Group attribute rather than requiring connection of all collection records to an institutional party record. However, contributors may make such connections if they wish.
Disambiguation: A party of type "group" is not the same as and has no relationship with the "group" attribute to which a Registry Object is linked for purposes of display in Research Data Australia. The "group" attribute is also used as the basis for enhanced contributor home pages. See Group for more information about this attribute.
Search to see if your party is already described before adding a new party record.
Where to search:
Type of party is required. There are three party types, person, group and administrativePosition.
Administrative position is a kind of party where the position name and contact information are present but the identity of the party filling the role is not specified.
Many data sources provide ANDS with party names such as "Data custodian", "Data Officer", or "Data Manager". There are over 1000 collection records (about 10% of the records in Research Data Australia) that contain role names for party records. Such role names are common in large data management environments.
Remember, a party of type "group" is not the same as and has no relationship with the "group" attribute.
Keys are required in party records. Keys must be unique in Research Data Australia. More information on creating keys for parties.
Collection records link to party records using the Research Data Australia party record key. Alternatively, collection records link to the NLA party records using the NLA party identifier (see Step 8 for more about identifiers).
Names for parties should be described by recording each name component in a separate name part. The type of name part is described by choosing from the following :
Only use Date range if the name has changed over time and older versions of the name have been recorded in the metadata being provided, such as when a research centre has changed its name since the related dataset was created.
Activity records in Research Data Australia (RDA) enable the description of research projects and programs as well as research grants and funding programs. Data collections are often the output of research activity and the description of related projects or grants can provide additional context. Since April 2015, RDA has offered users a specialised search option that enables the exploration of research activity in Australia. ARC and NHMRC research grants are recorded in RDA as activity records.
|7 Jul 2012||
First web publication as separate page (previously part of activity page)
|12 April 2013||Included more extensive information about existenceDates and the differences between grant dates and project dates|
|9 May 2013||Reordered steps to align with Release 10 interface changes|
|28 March 2014||Updated Step 8 Identifiers with information about the display of multiple activity records for Release 12|
|19 June 2014||Content reviewed with minor changes.|
|14 April 2015||Content updated to reflect changes implemented with Release 15|
|31 July 2015||Content updated to reflect changes implemented with Release 15|
|Please send any feedback on this page to email@example.com|
Services in the research domain support the creation or use of research collections and datasets.
ISO 2146 defines a service as 'a system (analogue or digital) that provides one or more functions of value to an end user'. Services can be web services, provided across the web and following a well-defined machine protocol, such as OAI-PMH Harvest or RSS Syndication; but they may also be provided by offline software (e.g. the functionality of software running a simulation, or creating annotations).
As with parties and activities, the ANDS Collections Registry gathers service descriptions in order to provide context for the collections it registers, and to enable discovery of related collections, rather than to serve as an exhaustive registry of research services. For that reason, the services described in the registry are usually related to collections—whether the service exposes the collection, or was involved in creating the collection.
To be used, a service must be implemented. Therefore, a service must have a specific delivery method which makes it available to a client.
Delivery Methods include:
Web services are the most straightforward type of service to model: the definition of their function and scope is specified through statements of behaviour and data representation, and they have a well-defined protocol for interaction with service clients. These protocols can usually be indicated through the service type.
Other types of service are used to model instruments, software, and workflows. These tools often do not have well-defined protocols for interaction, so protocols need not be specified in their service description. These tools also have properties which are not captured by modelling them as services (e.g. asset numbers, operating systems): this partial representation is deliberate, because of the restricted scope of service descriptions.
Service descriptions in the ANDS Collections Registry are meant to convey only high-level, indicative information. More complete detail about data collection provenance should be provided in local metadata stores, and linked to as Related Info from the service description.
Instruments are modelled as offline services—although strictly speaking what services model is the capability of instruments to create data collections. Instruments are often housed in facilities, but facilities should be modelled in the ANDS Collections Registry as parties: they are the organisations which own the instruments. Instruments can be composed of individual sensors; both the large-scale and more fine-grained instrument may be of interest to users. Instruments can be related to each other in a partOf relationship. For example, a specific detector can be part of a Synchrotron beamline instrument, or of a radio telescope.
Whether to model both the instrument and its component sensors in the ANDS Collection Registry depends on whether it will be useful to discover collections through sensors, rather than just through the instrument. This is a policy decision for partners; some partners have already elected not to do so. The details of sensors used to gather the data should at any rate be recorded in local metadata stores.
To be used, a service must also be instantiated: there must be a particular instance of the service being described, rather than the class of all matching services, and it should be possible to name the location of the service, and the parties managing the service. For example, the ANDS Collections Registry would describe the Monash University ARROW repository OAI-PMH feed, rather than giving a generic description of the OAI-PMH protocol.
Treating services as instances means that there may be many service records in the ANDS Collections Registry that look quite similar—distinct sensors, for example, or distinct deployments of RSS. As long as each instance is associated with a collection registered with the registry, it is still appropriate to distinguish between the service instances.
Create a collection record of type="software" to describe downloadable software for a service, rather than an instance of the software running on a specific machine. In some cases, it may be appropriate to create a service record to describe an instance of software as well as a collection record to describe downloadable software for the service. Separate records would be expected for different versions of the same software, or for different implementations.
Depending on how services relate to collections, services can be classified as Creation services, Metadata services, Discovery services, or Reuse services.
Discovery services are typically web services; creation services typically have other delivery methods. The service type is described by choosing from the following:
The kind of service (service type) is described by choosing from the following (ANDS is currently considering expanding this list):
The service names for creation & metadata services are deliberately generic (and are taken from the e-Framework, which is not research-specific). To apply them, use the following:
What is the input into the service?
No reuse services have been included in the current service type vocabulary. The service type vocabulary can be expanded as the community requires.
Services may also have access policies. These are described in a separate element. More information
Researcher Fred from Notre Dame University uses the Brahe interferometer on the Farnell Radio Telescope, to gather observations on pulsar THX-1138. The observations are registered with ANDS as a collection.
The pulsar data collection represents raw data. The Tempo2 pulsar timing software is used to extract pulsar timing data from a range of observations, including TXH-1138, and the resulting analyses are also registered with the ANDS Collections Registry.
The pulsar data collection is exposed for search through the SRU protocol. The web service allowing this search is hosted at the University of Launceston.
The following diagram illustrates the relations of the objects described in this scenario:
The date metadata describing a service was last changed in the source system can be recorded. See Date modified.
Metadata records describing services are grouped together on the Research Data Australia home page. The service category and service type are displayed. The hyperlink to a page or XACML document describing service access policies is displayed. Date modified is not displayed. All information is searchable.
Often a collection is tightly bound with its discovery service, so there can be confusion about whether to model it as a collection or a service. The purpose of the ANDS Collections Registry is to promote the discovery of collections, not of services. So an entity such as a repository or portal must have a relevant collection description contributed to the registry. It can also have a relevant service description contributed, if that service description adds sufficient value. A discovery service that does not provide access to a specific collection is not relevant to the ANDS Collections Registry, and likely needs to be modelled differently.
For example: a podcast is a collection of recordings, combined with a syndication service for accessing that collection. The podcast should be described for ANDS as a collection, since that is the aspect of the podcast most relevant to the Collections Registry. The RSS feed to the podcast can be added to the Collections Registry as an associated discovery service (syndication-rss). But the podcast should not be described as a service instead of a collection.
HTTP-Search for a single keyword can be assumed as default search functionality for a collection. (This is the single search box on the home page of most collections.) If the ANDS Collections Registry already has a description of such a collection, then a single-keyword search need not be registered in the ANDS Collections Registry as a distinct service description.
Portals provide access to an aggregation of collections. A portal can be modelled as either a service or as a collection; if it is modelled as a service, its constituent collection should also be described in the ANDS Collections Registry.
The service type is a two-part string, with the first part specifying the service genre and the second part specifying the protocol (for example, syndicate-rss, harvest-oaipmh, search-sru). For creation and metadata services, which do not have generically used protocols, only the service genre is specified.
If there is a well-defined protocol for an instance of a creation or metadata service, the service description should provide that protocol information in the Related Info element. Added protocol information should also be provided in the Related Info element for discovery services, if there are local extensions to the service protocol that service users need to know.
The value for the service genre is taken from the set of service genres registered with the e-Framework. The protocol is taken from known services identified by initial Collections Registry content providers. New genre-protocol combinations may be added on application to the RIF-CS schema manager (contact firstname.lastname@example.org).
Software tools can have multiple types applicable out of the service type vocabulary: unlike web services, software tools can perform multiple functions. However the service description of software tools shall have a single type, reflecting the primary use of the tool in the research community.
For web services, the electronic address is a URI that provides access to the service: in particular, it is a URI that can be processed by a client following the service protocol (service endpoint).
If the service is syndicate-rss, for example, the location in the service description will be a URI that can be processed by an RSS reader.
Web services alone may use the <arg> element in addition to the <value> element, to differentiate between a base URL and the service arguments. This only applies to HTTP Query services, in which the service call URL contains service arguments. The <arg> element indicates whether each of the URL arguments is required or optional, whether they are plain text or embedded objects, and whether they are inline (embedded in the base URL) or key-value pairs in a HTTP query. The <arg> element does not describe the semantics of the arguments, and should not be treated as a substitute for linking to protocol documentation for the service.
If the electronic address type is "wsdl", the <value> element must be a URL pointing to the WSDL file. Human-readable descriptions of the service online should be recorded in the Related Info element instead. A physical address or electronic address (email) can be provided as a contact for arranging access to the service. Typically this will be the same address as for the party managing the service.
For software and workflows, the electronic address is likewise a URI that provides access to the service. A physical address or electronic address (email) can be provided as a contact for arranging access to the service.
For offline services, a web address is not acceptable as a location. That is because an instrument home page does not provide direct access to the service, the way an RSS feed address or a search query does. Web pages about the service should be recorded in the Related Info element, just as they are for online services. A physical address or electronic address (email) should be provided instead; as above, the physical address is intended to allow users to gain access to the offline service (contact address).
Delivery Method will be suggested for inclusion in future versions of RIF-CS. As an interim measure, include the delivery method as a string without spaces (webservice, software, offline, workflow) in a description element of type "deliveryMethod".
Where two or more service records, from the same or different data sources, share common identifiers, the records are treated as describing the same service.
In Research Data Australia, the records are merged into a single search result and links to each of the merged records are displayed on the view page of each record.
This feature of Research Data Australia is described in detail in Step 8 of Best practice for creating party records. The description and examples on this page apply equally to multiple service records.
Note: “local” identifiers are not used to link multiple records together.
Most of the relations described below are bidirectional; for discovery to be most effective, they should be represented in RIF-CS in both directions. In particular, if a collection links to the creation service that produced it, the creation service should also link out to all the collections it has produced. This allows discovery of more collections.
Often information on relations is only available in one direction: the description of a collection will link to the service that produce it, but the description of the service does not have access to the collections that the service has produced. In such cases, it is desirable for ANDS to automatically generate bidirectional links between the objects. This functionality is forthcoming.
Currently the only relation modelled between services is hasPart/isPartof. Creation services can often be modelled as part of another creation service, as with sensors and instruments, or individual services and service workflows. Metadata and Discovery services, on the other hand, are not normally modelled as forming part of other services.
Service descriptions must have a relationship to at least one collection. Depending on the service type, services and collections can have the following relations:
The supports/isSupportedBy relation is generic; the other relations are specialisations of this relation.
If a transform or assemble service is used to change collection A into collection B, the service operates on input collection A, and produces output collection B. (For collection discovery, the produces relation is more important than the operates on relation.) Collection A and collection B are related through the relation isDerivedFrom/hasDerivedCollection. This relation is distinct from partOf: if a collection is derived from another collection, the output is a new collection, and is not considered part of the old.
If service A is part of service B, and service A is related to a collection, then service B should not also be modelled has having the same relation to the collection. It is best practice in information science to link only to the most detailed level. For example, a collection would be linked only to the Brahe interferometer—and not to both the Brahe interferometer and the Farnell telescope. Users should navigate down from the Farnell telescope to discover collections associated with individual receivers.
The following relations can be modelled between parties and services:
The relationship between a facility and its instruments is modelled through the isOwnerOf relation.
Note that the owner of a service is distinct from the owner of the associated collection. In the example above, the Norfolk Island Astronomical Commissariat owns the telescope that captured the pulsar data, but the pulsar data itself is owned by Notre Dame University.
No relations are currently modelled between services and activities. The existing relations isOutputOf and isFundedBy between activities and collections could be extended to services. However this level of detail is beyond the requirements of the ANDS Collections Registry, and is appropriate instead for a services registry.
The relation hasAssociationWith, as with other registry object classes, allows an unspecified relationship to be signalled between the service and the target object.
|April 2010||Consultation draft|
|26 October 2010||First web publication|
|25 January 2011||Complete revision to add creation and metadata services|
|14 April 2011||Added link to Access Policy (services only) page|
|28 March 2014||Add information to the best practice section, about the display of multiple service records in Release 12|
Thank you for visiting the 'new look' Content Providers Guide! We'd really appreciate your feedback. Please tell us what you like about the Guide or how it might be improved.
Send your questions and comments to: email@example.com