Reference data are standardized codes or data entities that are typically used by multiple applications as lists or tables. In fact, they are often called “code tables.” An individual code table may seem like a simple thing, but a well-managed collection of code tables and related reference data spread across an enterprise is a resource that can bring great value to that enterprise — or cause great problems if it is not well maintained. EDG lets you control your reference data so that you can put it to work for you as efficiently as possible.
For additional information on reference datasets see:
- TopQuadrant white papers for perspectives and details on reference data management and related topics, or
- the Asset Collection documentation.
This document is organized by roles showing how:
- Reference Data Stewards can create and modify enterprise reference datasets and and ontologies, import reference data and manage information about it.
- Data Stewards can create reference datasets that reflect reference data in sources they are responsible for. They can then use crosswalks to align these with the enterprise reference datasets for the same entity.
- Data Managers can export and provision reference data for use in their applications.
- Business Analysts and other users can consult EDG to learn more about codes and code sets important to their work.
Access TopBraid EDG Application
To work through this guide, use a browser to access the TopBraid EDG web-application running in one of the following environments.
- Create an EDG trial evaluation at TopQuadrant, and run EDG from the TopQuadrant servers. Submit an EDG evaluation request or contact TopQuadrant.
- Use TopBraid Composer – Maestro Edition (TBC-ME), and run its demonstration version of EDG. Download and install TBC-ME. If using sample data then, this will also require separate installation of the EDG samples project.
- Install EDG on a server accessible to your network (which could also be a local Tomcat server, via localhost). For a custom install, contact TopQuadrant and see EDG Server Installation and Integration. If using sample data, this will also require separate installation of the EDG samples project.
For the TBC-ME option, launch TBC-ME and then start the demo version of EDG via the top menu: TopBraid Applications > Open TopBraid EDG. Browse to http://localhost:8083/tbl. Logging in as Administrator requires no password for the demo version. All asset collection types are available in the demo version.
For the other two options, the system administrator or TopQuadrant will provide you with a URL, a username, and a password. Browse to the URL and log in. Server licensing will determine the availability of the various asset collection types.
TopBraid EDG User Interface
For a basic orientation to the user interface, see EDG User Guide - UI Overview.
You will always see the collapsable left hand side Navigator that will help you navigate between asset collections and pages of interest. Menu sections in the Navigator are also collapsable and expandable. A "hamburger" menu button in the header offer an alternative approach to quick navigation between different glossaries.
Reference Data Management
Getting Started for the Reference Data Steward
Defining the Structure for Reference Dataset
Each reference dataset designates some ontology class as the dataset’s main entity, which defines the type of the dataset’s reference instances, i.e., the individual code items.
Ontologies describe business entities, including entities for which you will govern reference data (codes). Ontologies can be thought of as a powerful flexible representation of business glossaries. An ontology may contain a class (entity) such as country, product category, industry and so on. Each of these entities can have different fields (properties) making it easy to support different types of reference data. Reference datasets in TopBraid EDG are not limited to having only a handful of predefined fields such as a code and a description. They can have any property you may need to capture. For example, a reference dataset for country codes may have properties such as the various ISO codes, capital, gross national product, and language.
In order to create reference data, we need to first define the corresponding entity and its properties in an ontology.
Select the Ontologies in the left hand side navigation menu to see the list of ontologies you have access to. EDG lets you create a single enterprise ontology or a set of individual ontologies (for example, per department or business area) which can be combined with one another using the “includes” mechanism.
TopBraid EDG Samples project includes a number of sample ontologies and datasets.
This tutorial uses the ontology: Enterprise Ontology – Example. To obtain it please download the EDG samples.
In this tutorial, we will be extending Enterprise Ontology model with definitions necessary to support a new reference dataset. Alternatively, a new ontology can be created. For information on creating a new ontology, see the user interface overview.
Select Enterprise Ontology from the table to go to a page where you can perform various operations with it – make changes, import data into it, export it, etc.
Users that have edit privileges can make ad-hock changes to a given ontology or dataset. Otherwise, they must follow a more formal process of modifying an ontology by using Workflows which will sandbox all changes into an isolated working copy until they are reviewed and approved. See Workflow Overview for details. In this tutorial we will make the change without using a workflow.
For details on search of the settings and menu’s in EDG’s editor pages, read the Asset Collection Guide – Editor page.
Creating a New Class
You will see several panels presenting ontology content.
The Class Hierarchy Panel, on the top left below, shows the classes in a tree structure. The Property Groups Panel shows the selected class’ properties (both attribute/datatype and relationship/object properties) as nodes in a tree.
The colored button at the top of the class hierarchy, next to the quick search field, will create a new class
You may also create new property or associate already existing property with a class by clicking the plus icon in the Property Groups panel.
The Node Shape Panel, lets you view and create node shapes. These model elements support creating different role-specific views into reference data. We will not use this feature in this guide, but you can learn about it by looking at the user guide for Ontologies.
As shown in the screenshot above, clicking on a node in the tree in the Class Hierarchy Panel (such as the class Country), displays information about it in the Form panel to the right of the tree. If the panel is not there, click the dropdown located on the right upper corner of the page and drag Form somewhere to the page. The Edit button at the top of the View/Edit form, switches the form into edit mode, making all fields on the form editable. It may also display and let you edit fields that currently have no data and, thus, you will not see them in the view mode. Alternatively, you can edit values for each field in-line by clicking on the pencil icon that will appear when you position your mouse to the right of the field’s name.
Later in this tutorial a reference dataset of airport codes will be created and populated with data from a spreadsheet. The following fragment shows data in this spreadsheet:
|Airport Name||City||Country||Country Code||IATA Code||Latitude||Longitude|
|Keflavik International Airport||Keflavik||Iceland||IS||KEF||
|Sault Ste Marie||Sault Sainte Marie||Canada||CA||YAM||
|Winnipeg St Andrews||Winnipeg||Canada||CA||YAV||
|St Anthony||St. Anthony||Canada||CA||YAY||
To add model support for this information, create a class named ‘Airport’ that will be used as the main entity in the reference dataset. To do this, select the top-level class named ‘Thing’ in the class hierarchy, click the yellow button in the header of the Class Hierarchy pane, enter the name “Airport” and click OK.
You will see the newly created class displayed in the Edit/View pane.
If desired, provide a description of your new class in the comment field.
Next, we need to add Airport as a public class of this ontology. First, click Enterprise Ontology Example v1.1 at the top of the page and then Edit.
Use the dropdown in the upper right of the form to select GraphQL Schema. Edit the “public class” field in the Shapes in GraphQL Schema section, click the plus sign and start typing Airport in the field. After selecting Airport, click Save Changes.
We will now create the following attributes for airports.
|Attribute Name (Label)||Description (Comment)||Datatype|
|airport city||Main city served by airport. May be spelled differently from the airport’s name.||string|
|IATA airport code||An IATA airport code, also known an IATA location identifier, IATA station code or simply a location identifier, is a three-letter code designating many airports around the world, defined by the International Air Transport Association (IATA).||string|
|latitude||A horizontal position of a location on the Earth according to a geographical coordinate system in decimal degrees, usually to six significant digits. Positive latitude is above the equator (North), and negative latitude is below the equator (South).||decimal|
|longitude||A vertical position of a location on the Earth according to a geographical coordinate system in decimal degrees, usually to six significant digits. Positive longitude is East of the prime meridian, and negative latitude is West of the prime meridian.||decimal|
Create attributes by selecting the Airport class and clicking on the green icon after clicking the plus icon in the Property Groups panel. After entering the name of the attribute and clicking OK, You will see the data entry form shown below
Alternatively to manually entering classes and properties, you can use Import>Import Schema from Spreadsheet to automatically create them from the first row of the spreadsheet and then adjust as necessary.
Creating Label Attribute
Note that an attribute for the airport name has not been created. This is because there is a built-in attribute “label” which is intended to hold names. Label is always asked for in the Create New dialog for reference data items. If we want to edit this field later, we need to tell TopBraid EDG to show it on the form. Since this is a special built-in field, this requires some additional setup.
Click on the Airport class. On the Form panel, click Modify, then Add Label Property Declaration.
Defining Attribute to be used as a Primary Key
TopBraid EDG will always create a globally unique resource identifier, a URI, for each resource you create. There are different options for how the URI may be constructed. These options are described in details in the User Guide.
For reference datasets, each entry in a dataset gets a URI derived from the reference data code. To enable this, you need to identify the field which will contain code values. This field is declared to be a primary key for the entity. Note that the field used as a primary key must always have unique values for a given class of codes.
We will use IATA airport code as primary key. Click on this property and click on Edit button. Scroll down to the String Constraints section and type in a namespace to prepend when creating the unique identifiers. For example, http://example.org/Airport-.
Click on Save Changes.
Next, click Create Relationship similarly to creating the other properties above and name it “airport country“.
In the description field, describe it as “A country where an airport is located”. Set its Type/Class of Values to the Country class. (Failing to do this can cause problems when it’s time to import data into the new reference dataset.) To do so, start typing “Count” in the class field in the Type/Class of Values section and pick “Country” as it appears in the autocomplete.
Since the primary key for ISO Country is its two-character ISO country code and the spreadsheet contains this information, EDG will be able to create a relationship between airports and countries as we import spreadsheet data. Note that we have not created a field for the country name. Names and other information about countries is already maintained as part of the country codes reference dataset, and therefore including names will redundant.
In the next step a reference dataset will be created that will store reference data for the airports
For more information on working with ontologies, and especially creating property shapes that will let you validate reference data, see Working with Ontologies. Note that instead of creating each airport property one by one, we could have used Import > Import schema from a spreadsheet. This function is used to automatically create a property for each column in a spreadsheet. We have elected not to use it and, instead, walk you through the process of creating each property one by one.
Creating Reference Datasets
You can click on the + icon and select Reference Datasets in the dropdown to create a new reference dataset.
However, we want to automatically associate the reference dataset we’re about to create with a particular “governance area”. We can do this by creating the dataset directly from the Governance Areas page.
Governance areas group asset collections according to organization’s business or data subject concerns. Governance areas are used to define a delineated part of stewardship. They partition and delegate ownership of assets, and define a meaningful context for assets that are associated with a governance area.
Select the Governance Areas link located in the left menu under Governance Model section. First, create a new governance area. Click the Create Data Subject Area button, add a data subject area with the label
Not every user will have permissions necessary to modify governance areas. If you can’t create a new governance area, contact your TopBraid EDG Administrator.
Now you’re ready to create the dataset. Choose Reference Dataset in the Choose type dropdown for Create new.
You will see the following page:
Airports as the label (or name) of the dataset and for its description enter:
Reference dataset of airports with IATA codes. The Ontology to Include option lists ontologies that are available to base your reference dataset on.
In this case, select the
Enterprise Ontology Example v1.1 as the ontology to use. It contains
Airport class that we have defined for our airports data. Click Create Reference Dataset.
You will see a message that the dataset was created and you will be forwarded to the Import page where you can load data.
However, before we can do this, we must finish setting up the new dataset by identifying its main entity.
Setting the Main Entity
Ontology used for creating a reference dataset will typically contain several classes (entity types). After creating the reference dataset and before starting to work with data, you need to tell TopBraid EDG what reference data will be in the dataset. This is done by identifying the “main entity” for a dataset. In our example, it is
Airport class. There are two ways to set the main entity initially.
- If the main entity is unset, clicking on Codes tab to access to edit application will trigger TopBraid EDG to ask you for the main entity class.
- A reference dataset’s main entity class can also be set or changed by clicking on: Manage > Main Entity (Class).
We will use the first method. Click on Codes tab, and select
Airport , from the provided dropdown that lists classes available in the included ontology.
You will now use the Import tab to import reference data from the spreadsheet you downloaded earlier.
Importing Reference Data
Select Import > Import Spreadsheet using Pattern. Then click Choose File to select the spreadsheet. (Download the airports.xlsx spreadsheet to get a local copy to import.) This page has two more fields:
- Sheet index: by default this is 1. This spreadsheet has only has one worksheet and therefore there is no need to edit it.
- Entity type: a list of classes from the included ontology (the enterprise ontology) to indicate which one is being populated by the airport. Ensure that
Clicking Next shows several potential patterns for spreadsheet data. Select No Hierarchy. (Note: Reference data supports managing hierarchies as well as flat lists. However, the spreadsheet we are importing does not contain any hierarchical structures.)
The next step is to map the spreadsheet columns to the properties of the
Airport class as shown below, which maps the columns to the properties defined above and to the built-in “label” property. Note that in the image below Altitude column was not mapped by choice – to demonstrate that only mapped columns will be imported. The Country column was also not mapped because it contains country names that are already managed as part of the ISO Country Codes reference dataset – also included in the samples project.
Click the Finish button. After data is imported, click on Codes tab to view the reference dataset.
You will now be directed to the Editor page with the default layout for Reference datasets.
In the Search Panel, click on Columns icon to add more columns to the table.
Clicking on a row displays information in the Form panel which is to the right of the page by default. If the Form panel is not visible, click the dropdown located on the right upper corner of the page and drag Form somewhere to the page
The table displays 100 rows at a time by default. This default can be changed by selecting a different value in the Show dropdown located in the bottom of the table as shown above.
To save the current configuration of columns as a default for all users, click the Save icon ensuring Default is set to true.
Including other Reference Datasets
As shown in the first screen shot of the reference data, the Airport Country column contains URIs of the countries and not their names or the code values. It happens, because the reference dataset describing the country codes was not added to the Airports dataset (or not included in it). We can fix it by clicking on Settings tab and including the appropriate reference dataset. Click on Includes.
In a pop-up window, select Country Codes to include it in the Airports. After selecting, click on Close. Instances of the Country class will now be included in the Airports dataset by reference, meaning the data is not copied, but included.
Referencing other dataset in this manner ensures that reference data for countries is maintained in one place. If a country is renamed, for example, Cape Verde, an island country in West Africa, is renamed to the Republic of Cabo Verde, the update needs to occur in only one place, the ISO Country datasets. All datasets that include ISO Country will see this change immediately. At the same time, you will have access to country names and all other information from any reference dataset that includes country codes. The names and other reference data for countries is stored in the Country Codes dataset.
Once the reference dataset for countries is included, EDG will automatically match countries to the values of the “airport country” property. Click on the Codes tab. Note that Country codes appear in the Airport Country column instead of URIs as before. These codes come directly from the ISO Country dataset.
Click on any of the rows to see a View/Edit form for the selected airport.
The “airport country” property is now populated with a country code from the ISO Country dataset. Clicking on a country code link will open up a form that will show you other information about the country directly from the ISO Country dataset.
You can change the “focus” of the table from Airports to other data by using he dropdown field at the top of the table. Currently, ‘Airport’ is chosen. You can switch the focus to any other class related to the Airport. In our case, the only related class is Country.
Included data, such as the Country Codes data referenced by the Airports dataset, can be viewed and searched, but modifications to included data is not permitted. Included data can only be modified by editing the included referenced dataset directly. You will only be able to edit only codes for the main entity – or one of its subclasses.
Managing Metadata for a Reference Dataset
Reference datasets (and, in general, any asset collection in EDG) can have metadata such as name, description, status, etc. The metadata associated with an asset collection can be viewed and modified on the Form panel. When you click on the Codes tab, EDG will always display information about the dataset on a form. Once you navigate away from it by, for example, clicking on various assets contained in a dataset, you can come back to this view by either clicking on the home button or by clicking on the name of a reference dataset in the header bar.
Use one of these options. Scroll down and expand the Status sub-section to see available information about Airports dataset.
When a dataset is first created, the status is automatically set to “Under development”. It can be updated to reflect the current status of a dataset.
TopBraid EDG is shipped with some predefined status values. They are configurable if your organization needs a different set of values.
Click the Edit button to see more available fields. You may want to differentiate private (internal) reference data from public (external) such as ISO country codes. Set is external dataset to “true” in the Status section of the form. IATA codes are maintained by the IATA Association, which publishes updates bi-annually. Change the status code to Approved. Click Save Changes at the top of the form.
Once the status of a reference dataset is approved for use, you will no longer be able to delete codes from the dataset, but you will be able to change information about them.
Documenting a Reference Dataset as an Enterprise Reference Dataset
Your organization may have several reference datasets in EDG that contain codes for a given entity. For example, you may have different existing applications and corresponding sources that already store and use airport codes. The goal of standing up a system for managing reference data is to achieve alignment across your existing reference data and to streamline its management. This alignment takes time. At least initially, you may have in addition to a “master” reference dataset that you want to be a definitive source of reference data for a given entity across all system, reference datasets that capture what each of your systems is using.
To differentiate between your master reference dataset for airport codes and others “in situ” reference datasets, In the Status section of the form, click on Edit and find is enterprise dataset field. Set this flag to true and click Save Changes.
If another reference dataset is created for the same entity, it could be mapped to the enterprise dataset using Crosswalks. TopBraid EDG can auto-create crosswalks between two datasets. It also offers crosswalk web services to translate between codes.
Creating Reference Data Facts
You may want to record some additional information about this reference dataset. This can go into the description field. You can also add new metadata fields. And you can use a pre-build property called fact. To enable this property, click on Settings tab and, in the Includes dialog look for Reference Data Fact properties. This is a small pre-built ontology that defines a property called fact. Include it.
Click on Codes to go back to the editor. Click on the dropdown in the upper right conner of the Airports form and switch your view to Facts. Enter the following “fact”:
IATA codes should not be confused with the FAA identifiers of US airports. Most FAA identifiers agree with the corresponding IATA codes, but some do not, such as Saipan whose FAA identifier is GSN and its IATA code SPN, and some coincide with IATA codes of non-US airports.
Note that the text area displayed allows rich text, including hyperlinks. The links above can be replaced by choosing the text to be hyperlinked, such as “Saipan”, and click the chain link in the icon box. Add the hyperlink to the text box that appears.
Click on the plus + icon to the left of the fact field to add an additional entry and enter this additional fact there:
Since “Q” is used for international communications, IATA airport codes never begin with “Q”.
Save your changes. The facts are now part of the metadata for the dataset and can be referenced, searched, etc.
You can define facts at a dataset level and you could also specify them for a given code in the reference dataset. If you want to do the latter, you need to include in the ontology defining your main entity class the pre-built Reference Data Facts ontology and associate facts property with that class.
Entering Subscription Information for External (public) Reference Data
In the editor Form, click the Edit button again. Set “is external dataset” to true and save. Edit again and you will see a new sub-section on the form called Subscription; this is used to capture subscription-related information for external reference datasets. Add “IATA Association” to the “sourced from” field. You will only need to type the first few letters of its name, because the reference data knows that only one defined organization begins with those letters.Click the Save Changes button.
TopBraid EDG is shipped with predefined metadata fields for reference datasets. They are configurable if your organization needs different metadata. EDG is a semantic, model-based solution. Configuration is done using steps similar to those used to modify ontology models to accommodate new reference data.
Assigning Access Privileges to other Users
For any asset collection in EDG, including reference datasets, a user can have one of the following permission roles (see User permissions for more information):
- Viewer A Viewer can browse a dataset, viewing all the reference data (as well as any change history associated with that data) and the metadata associated with a dataset. A Viewer can create saved searches and export data. They can create and view tasks, add comments and change status of a task assigned to them. A viewer can also start a workflow. The Viewer then becomes the Manager of the working copy that is associated with the workflow. However, these changes will not affect the reference dataset until they are approved and committed by a user that has Editor permission for the dataset.
- Editor In addition to being able to perform all activities that a Viewer can perform, an Editor can make changes to the dataset’s metadata and to the reference data itself.
- Manager A Manager has the most capabilities. In addition to all the activities that an Editor can perform, a Manager can delete an entire dataset, they can change the default columnar view for all users and they control the access privileges that other users have over a particular dataset by assigning Manager, Editor, or Viewer permission roles to them. They can also reassign and change the status of all tasks, even those that are not assigned to them. A person who creates a reference dataset automatically becomes its Manager.
To give others access to the dataset, go to the Users tab on the dataset’s home page.
Permission levels can be set for (1) individual users, (2) user security roles (e.g., from Tomcat or LDAP), The list of users you will see on this tab can include individual users and LDAP roles. A Manager can assign Manager, Editor and Viewer privileges to each user or user group. Users page is also used to set up governance roles (as defined in the Governance model) for individual reference datasets. Governance roles can also be defined at business area or data subject area a reference dataset is associated with.
Governance roles provide an alternative approach to assigning permissions. A user has any governance role for a reference dataset (or any other asset collection), specified either directly for a dataset or in directly for a subject area the dataset belongs to, will automatically get Viewer permission. And you can also assign Editor and Manager permissions to governance roles.
Modifying Reference Data
Dominica’s main airport, the Melville Hall Airport, was just renamed to the Douglas-Charles Airport in tribute to its late prime ministers, Rosie Douglas and Pierre Charles. While your next bi-annual update from the IATA Association will reflect this change, you need to make it ahead of receiving the update.
Click on Codes tab. Search for airports in Dominica by clicking on Filters icon, then selecting “airport country” field. Start typing “Domi..” to get the match on the official country name – “Dominica”.
To get back the unfiltered list of all airports, click the X next to the filter.
You can now change the airport name by clicking on Melville Hall, then clicking on the Edit button.
When you make the change to rename its label value to “Douglas-Charles Airport”, you can check the Enter log message before clicking the Save Changes button if you want to include a log message about your change.
Alternatively to clicking on Edit, you can mouse over the field value and click on a pencil icon that will appear to the left of the value. This will open just that field for inline editing.
You will not be able to enter a log message if you use inline editing.
TopBraid EDG keeps a complete audit trail of all changes. Click the dropdown located on the right upper corner of the page and drag Change History somewhere on to the page.
Instead of using Filters, you could have also clicked on Columns icon, added the “airport country” column to the table and typed Dominica in the Search field.
This approach, however, does not search across all reference data. It will only filter within the data that has been loaded into the table. Data loaded into the table can be a subset of the codes in a reference dataset. By default, TopBraid EDG will load 1,000 rows. Our airports dataset has over 5,000 entries. Thus, you may not find the result you are looking for even if data exists. Your EDG administrator can change the default setting. However, this may impact performance for large datasets. Using search filters is always the most reliable approach for large datasets.
Creating New Codes
To create a new airport, click on the New button in the button-row on the Search Panel.
Export, Collaboration, and other Activities
Some of the data stewards’ tasks overlap with the tasks of other users. For example, stewards may build exports of reference data, but so do data managers. These overlapping activities, including collaboration between users working with reference data, are covered in the Getting Started Guides for Data Manager and Business Analyst.
Creating a Crosswalk
Some systems may use a different local set of codes for the same entity – in our case, Airport. In these cases, you will want to map local, in-situ codes to the enterprise reference dataset for airports.
First, let’s extend the ontology by creating a new class “Local Airport”. Define for it an attribute “local airport code” with the string datatype. Make it a primary key for the class. Specify start of the URI pattern of your choice.
Now, create a new reference dataset. You can do this from the Governance Areas page as described previously. Or, alternatively, go to the EDG home page and click on the Reference Datasets located on the left navigation menu under Asset Collections. You will see a page listing all Reference Datasets you have access to. This page includes a Create New Reference Dataset link. When dataset is created this way, it will not be associated with any governance area. You can add association to a governance area later by updating dataset’s subject area in the editor Form .
Let’s assume that it is a dataset used by a hypothetical Flight Tracker application and call it Flight Tracker Airport Codes. Base it on the Enterprise Ontology.
Click on Codes. When asked, set main entity to Local Airport and click on Continue. Create a few New York area airports using data from the table below.
|Local Airport||Local Code|
Create a new Crosswalk from the Flight Tracker Airports to the enterprise reference dataset Airports as shown in the image below. Click Finish.
You can now map two sets of airport codes manually or automatically. TopBraid EDG supports many to many mappings. Click on Mappings to view the crosswalk. Initially, it has no mappings. To map manually, click on a row and start typeing in the Match column in the form.
Autocomplete list will appear. Select your choice from the dropdown and click on Add Match.
To auto-map select Generate Mappings button.
TopBraid EDG will generate some suggested mappings for you based on the airport names. Move the confidence level to 50% in the slider to filter out unlikely suggestions.
You can now accept suggestions one by one or move the confidence level even higher to let’s say 70%, accept all top suggestions and then individually pick any lower confidence suggestions you want to apply. From the generated list, we want to accept La Guardia mapping, Newark Liberty mapping and Westchester Co mapping. The official name of the Islip airport on Long Island is Long Island MacArthur, so it was not found. Add this mapping manually. Your crosswalk should now look as follows:
To see more information about the mapped airports including their IATA codes, you can click on a row. The form will open in the Form panel. Click on the arrow next to the airport.
For more on working with crosswalks see Crosswalk User Guide pages.
Documenting the Use of a Reference Dataset
If you are using TopBraid EDG for Metadata Management together with TopBraid EDG-RDM, you can document the use of a reference dataset in your applications catalog, data assets catalog and/or business glossary. See relevant User Guides for more details.
Getting Started for the Data Manager
While this section can serve as a standalone tutorial, it assumes that all steps described in the Getting Started for the Data Steward section has already been completed. the Airports reference dataset has been created and populated with data and you have access to it.
Defining Reference Data Export
As a data manager, you may need to distribute reference data for use in your data source. Export is one way of doing this. Reference data can be exported in full or as subsets of data defined through search criteria. After finding the reference dataset you need, click the dataset’s Export tab to view the available exports. (Examples in this section use the Airports reference dataset.)
This tab includes an option to export all information available in a dataset. There may also be exports that focus on specific subsets of data; these are accessible from the Export Saved Search or Export using Saved SPARQL Query links. If there is no export that suits you, you can create one.
Creating SPARQL query for re-use requires knowledge of SPARQL query language. SPARQL queries can be defined, tested and saved by clicking on the SPARQL Endpoint link. Export formats for these queries include CSV, TSV, JSON and XML.
Saved searches can be created entirely using EDG UI. Click Codes tab, create and save a new search.
First, add columns for IATA code, airport city and country. Click on the Filter icon and select airport country column. Start typing “United” in the airport country field and pick “United States of America” from the autocomplete. Results will appear in the grid after the entry is picked. Different export formats can be chosen by clicking on Export icon. Export formats include Excel, CSV, TSV, and JSON.
If these results fit your needs and you expect to pull this data from the dataset on a periodic basis, save the search by clicking the Save icon above the filter and giving your search a name such as “US airports”. Saved searches are web services that you can use to automate distribution of reference data.
Open the Search Library panel by clicking the dropdown located on the right upper corner of the page and drag Search Library somewhere to the page. Selecting one populates your search criteria (filters) as specified by the selected search, and you can then click the Select search button to re-run the saved search.
When selecting a saved search from the list, note there is a Service URL column that can be used as a RESTful web service call to invoke the search.
See Searching in the editor for more information on specifying search criteria.
Using TopBraid EDG-RDM Web Services
TopBraid EDG includes pre-built services for validating your locally stored reference data against the datasets managed by EDG. It also includes crosswalk services for translating from one set of codes to another set of mapped (cross-walked codes). See relevant Guides for more details on how to use these services.
Getting Started for the Business Analyst
While this section can serve as a standalone tutorial, it assumes that all steps described in the Getting Started for the Data Steward section has already been completed. the Airports reference dataset has been created and populated with data and you have access to it.
Finding a Reference Dataset
When you click on Reference Datasets link in the left hand side navigator, you will see a table listing reference datasets you have access to. This list can be long, especially in large organizations with lots of different reference data.
Table displaying datasets is sortable. Click on any of the columns to sort by the subject area a dataset belongs to, creator, date of creation, user that last updated it, update date, and main entity.
If you know of collections (e.g., ontologies or reference datasets) in your TopBraid EDG system that do not appear, you might not have the appropriate viewing or editing privileges for them. Each such collection requires a manager to provide access by setting you (or your security role) as a viewer, at least. See the collection type’s User Roles utility documentation for details about these steps.
Further, you can filter the table by typing in the Refine field. The text string entered will be matched against the information in all columns.
Finding a Code
To find a specific code, go to Find Code in the left hand side navigator.
You can also use Search the EDG facility as described in the Getting Started with Business Glossaries.
Viewing Dataset’s Metadata
The editor form contains descriptive and contextual information about the dataset, grouped into sub-sections. Note that empty sub-sections might not be displayed until the form is placed into Edit mode.
Also, some “dependent” (variant) sections might not appear unless certain conditions apply, e.g., setting Status > is external dataset > true, identifies a reference dataset as external/public which makes available the Subscription section with associated fields describing the source of public reference data and how and when it gets updated.. The Property Definition (Semantic Analysis) section has a field-by-field description of each field in the dataset’s main entity class.
Using Reference Dataset and Data Facts
As a Business Analyst you may have a report that needs to include a data feed that uses FAA airport identifiers. Reviewing the data, the FAA identifiers in the data seem to match the IATA codes, but you want to double-check that this would correctly integrate with the rest of your data which uses IATA codes.
Expand Reference Dataset Facts sub-section of the Facts Shape section of the form. You will learn that while many FAA identifiers are identical with IATA codes, there are also differences. Assuming that these are the same codes would have let to errors in the integrated reports. To correctly integrate data, you should request a steward to build the crosswalk between the two sets of codes.
Creating Tasks and asking Questions about a Code
You may want to ask the reference data governance team to add FAA identifiers to the reference dataset because you believe this information will be useful not only for your immediate task, but for other applications and, should therefore be managed with the rest of the reference data.
Reference datasets let you log requests and questions in a form of Tasks. Tasks can be associated with an individual code or with the entire dataset. To create a task for the entire dataset, go to the Tasks tab for a dataset, click the Create new task button and enter:
“Most of my data is coded with IATA codes, but I am starting to integrate new data feeds that use FAA identifiers. Please expand the dataset to include FAA identifiers.”
You can select which user to assign the task to. By default it will be assigned to the dataset’s manager. Click the Create Task button.
The task is now displayed in the tab. Tasks can be filtered by assignee and by status. There is a Comments section below Summary that lets users post responses or ask for additional information.
To create a task for a specific airport, click Codes, select the code you want to associate a task with and click on the Explore icon at the top of the form.
You are now ready to explore the Asset Collection Guide to learn more about the many capabilities of TopBraid EDG, including workflows for team collaboration, importing more complex spreadsheets, and more.