Preparing RDF Data for Loading into EDG

If you already have data in RDF format, this section of the guide will help you load it into EDG. 

EDG separates ontologies – asset collections that define schema using classes, properties and shapes, from other types of asset collections – those that contain “instance data” described by the ontologies. RDF files you will load into EDG should follow this separation. Especially important is that files with instance data do not contain schema definitions. If they do, then you would need to load them into EDG ontologies and then, in EDG, move instance data into another type of collection. Depending on the size and complexity of your data, it may be simpler to process the files to enforce this separation.

Prior to loading RDF data you will need to decide how to organize it in EDG – how many asset collections you will have, what each one will contain, etc.

You may decide to follow the partitioning approach implemented in your files i.e., create a separate EDG asset collection for each file. Or you may want to combine some of the files into a single asset collection in EDG. If you do the former and your files reference each other using owl:imports statements, these statements will no longer work since the base URI of an asset collection in EDG will be different from the base URI of the file you are loading. After you create an asset collection for each file, use Settings>External Graph URI to help EDG transform these owl:imports statements. This must be done prior to loading data. 

If you will decide to combine multiple files into a single asset collection in EDG, then prior to load, remove any owl:imports statements that cross reference the files you are combining.

You may also keep some files as files in EDG workspace, without physically loading their data into an asset collection. This may be an option if you will never change their content in EDG. In other words, for EDG purposes, their content is read only. In this case, you can simply upload them into EDG workspace in a project of their own. Then, create an asset collection in EDG and use Settings>Includes to include these files by reference. Note that in this case, you will not be able to use Search the EDG to find resources in these files. Nor will they be found by the global look up. These features only work for data that is physically part of EDG asset collections.

You also need to make sure that files you are using do not contain owl:imports to graphs outside of EDG’s workspace. All imports must refer only to EDG asset collections or to files in the workspace. External imports will not be resolved since dynamically loading data from the external server may time out and present a security risk.

The rest of this section provides instructions specific to a type of an asset collection you will be loading your RDF data into.

As described in the Importing RDF into the Taxonomy Editor section of the User Guide chapter,  EDG expects that RDF imported into the taxonomy editor conforms to the W3C SKOS standard. That section describes specific details about classes and properties that EDG expects to see such as skos:ConceptScheme and skos:hasTopConcept. If you have SKOS RDF that does not fit this model—for example, skos:ConceptScheme is absent —you can use SPARQL CONSTRUCT or UPDATE queries and TopBraid Composer to add or convert to the necessary SKOS data. TopBraid Composer includes a range of features for developing and executing these queries, as well as for saving them in a SPARQLMotion script if you need to run a given CONSTRUCT query or set of queries on a regular basis. Taxonomy Editor also expects the use of skos:prefLabel. If your RDF contains rdfs:label values, they will be converted to skos:prefLabel values.

EDG Ontology Editor has no special requirements on the classes, properties or instances that must be present. The only exception is the presence of rdfs:subClassOf statements for classes, which are needed for them to appear in the class tree.

EDG Crosswalks contain RDF triples of the following form:

example:Vocabulary1_Resource1 skos:closeMatch :example:Vocabulary2_Resource1.

Valuable data for use in your vocabularies and asset collections may not be stored in RDF, but instead in another model or format. The Importing spreadsheet data section of the User Guide chapter describes how to import spreadsheets representing a taxonomy hierarchy in a choice of patterns; if your spreadsheet does not fit one of these patterns, TopBraid Composer offers several approaches for converting a spreadsheet into RDF that can then be converted to SKOS RDF with CONSTRUCT queries. In the TopBraid Composer > Help, see Importing Data Sources > Import external information > Import Spreadsheets for an overview of these approaches and links to more detailed information.

You can also use tabular (ho hierarchy) spreadsheets to import data, including creating relationships between resources. For crosswalks the spreadsheet should have only two columns representing the two sides of the crosswalk. No mapping will be required.

You can create an ontology from a spreadsheet – using the header row as properties and the worksheet name as a class.

If your data is in one or more relational databases, TopBraid lets you create connectors that make it possible to work with that data as RDF triples so that you can then convert it to SKOS. In Composer’s online help, select Working with Data Back Ends > Working with Relational Databases to learn more.

SPARQLMotion scripts let you create scripts by dragging and dropping from a wide selection of specialized modules into visual data manipulation pipelines. You can run these scripts interactively from within Composer or as web services deployed to your EDG server. SPARQLMotion module choices such as ConvertJSONToRDF and ConvertXMLByXSLT give additional options for taking advantage of non-RDF data that may be available to you. In the Composer online help, select Application Development Tools > SPARQLMotion Scripts for an overview of SPARQLMotion and select Reference > SPARQLMotion Module Library Reference for a complete list of available modules. These modules can also be used within SPARQL Web Pages (SWP) scripts. Unlike SPARQLMotion where data transformation scripts are developed using a visual drag and drop approach, SWP is a textual scripting language using custom HTML tags and JavaScript.

TopBraid’s Semantic XML feature gives you another option for converting XML to RDF, storing enough information about the input XML to let you round-trip the data back to a valid XML document. Select Working with XML and Table files > Creating, Importing, Querying, Saving XML documents with Semantic XML to learn more about this feature.

If you are unsure which choices in the TopBraid Composer toolset would let you best take advantage of the data available to you, contact your EDG support representative.

Once you create a custom import script with TopBraid Composer, you can make it available to all EDG users by following instructions in the ProjectPlug-ins section.