Licensing and Enablement
The availability of any asset collection is determined by what is (a) licensed and (b) configured under Server Administration. To install a license or to view the currently licensed features, see Setup > Product Registration. To configure which licensed collection types are currently enabled or disabled, see Setup - EDG Configuration Parameters.
For general licensing information and available asset collections and packages, see the TopQuadrant website.
Overview of Reference Datasets
Reference datasets contain standardized data or codes, which typically are used by various applications as lists or tables. In fact, they are often called “code tables.” An individual code table may seem like a simple thing, but a well-managed collection of code tables and related reference data spread across an enterprise is a resource that can bring great value to that enterprise—or cause great problems if it is not well maintained. EDG lets you control your reference data so that you can put it to work for you as efficiently as possible.
EDG datasets are much more than just flat code tables. Reference data in different datasets can have relationships. For example, as currencies are associated with countries, currency codes have a relationship (connection) to country codes. Reference datasets can also model structural relationships in data, such as hierarchies of industrial categories, locations, or product types. Finally, you can capture any additional information you need to have about each code. And reference datasets themselves provide a lot of rich information or metadata such as the source of a dataset, how it is managed, where it is being used, and the meaning of each data field.
The tabular-editors of EDG collections (for searching, viewing, and editing assets) requires the underlying schema to be backed by SHACL. To migrate a collection’s included ontologies to a SHACL basis, see: Ontology Utilities > Convert OWL Axioms to SHACL Constraints.
Reference datasets no longer allow classes without a primary key property to be used as their main entity.
For additional perspectives and details on reference data management and related topics, see these TopQuadrant whitepapers.
Reference datasets are used with ontologies, which define the data schema (classes, properties, relationships, constraints) of the reference dataset items. For example, you might define a class (or entity) called Gender in an ontology and then, in a reference dataset that uses this ontology, enter the values Male and Female as instances of this list. Ontologies thus define the data attributes for each entity and the relationships between entities.
TopBraid EDG makes it possible for you to:
- Reduce independent maintenance of code tables: If different departments use the same code table, they may be maintaining individual copies of it on spreadsheets being emailed around to each other. When they all use the same copy, changes are coordinated, and they can be confident that they’re using the right codes.
- Reduce data quality problems due to coding errors: Workers who don’t have access to recent, correct codes can’t always enter the proper values, and improper values can lead to lost revenue.
- Reduce the cost of designing code tables for databases: When new code tables have similarities or other relationships to other tables, these relationships can be leveraged in the design of the new tables. Well-organized, searchable metadata about which applications use which code tables also makes it easier to coordinate new and legacy tables.
- Reduce data integration issues due to inconsistent codes: The inconsistencies caused by maintaining multiple copies of the same code tables, or by using copies that were updated at different times, can lead to problems when combining datasets that reference these tables. Consistent tables mean easier data integration.
- Make informed decisions based on code table data: Code table entries are often cryptic abbreviations, leaving people to guess about their meaning and appropriateness for which ones to use when. Metadata such as definitions and provenance information ensure that people will use the right codes in the right places.
Overview of Big Data Assets
Big Data Assets specify the data structures, jobs, nodes and other software and hardware components that make up a big data ecosystem.
Overview of Content Tag Sets
Content Tag Sets are used for tagging content using vocabularies managed in EDG. Users can tag (assign metadata to) content through a visual user interface that displays the context for both the content and the vocabulary. They can also run Tagger’s auto-classification capability to automatically assign relevant tags to content and review the results.
Overview of Corpora
A Corpus is a collection of read-only textual items, such as documents, excerpts, websites, etc.—along with their associated metadata. The original items are always imported from external sources, such as content management systems or web sites and are never originally created nor edited within EDG. Thus, a version of the Tagger interface allows viewing of Corpus content, without editing. The textual content of Corpus items provides the foundation for manual or automated tagging and annotation with Content Tag Sets (serving as the content graph).
Overview of Crosswalks
Crosswalks let you create connections between the terms in two different vocabularies. This is especially useful for defining connections between two different standard vocabularies or between a standard one and a specialized local one. Applications can use saved crosswalk connection data to enhance the use of either vocabulary by taking advantage of the connected data and metadata for search, classification, and other operations.
Overview of Data Assets
Data Assets support the specification of data objects that make up a data ecosystem. Data assets include databases, database columns and tables, data elements, datasets and their dataset type specifications, and logical and physical models.
Overview of Data Graphs
Data Graphs are general collections, used for instances of classes defined in user-determined ontologies.
Overview of Datatypes
Datatypes support the specification of scalar data types, structured datatypes, scales and code lists. Scalar datatypes include all of the ORACLE data types. Structured data types provide for the definition of arrays, lists and other composite data types. Code lists are used to specify enumerated values that need not be governed as reference data, such as status values.
Overview of Enterprise Assets
Enterprise Assets enable specification of the metadata associated with an enterprise’s business areas, activities, functions, roles, capabilities, processes, and information assets such as documents and reports.
Overview of Glossaries
A glossary is collection of terms in a particular domain (i.e., field or subject) of knowledge with the definitions for those terms. Unlike dictionaries, which are more general collections of words, glossaries only concern themselves with terms that will enhance one’s comprehension of a certain topic. Glossary terms are often highly specific to a particular business subject or area of operation. They could be thought of as a ‘jargon’. Just about any business activity and organization you can think of has its own jargon to go with it – from professional disciplines to operational activities.
A business glossary goes beyond just a list of terms. Linking terms to IT assets establishes a connection between business and IT and enhances organizational collaboration. Glossary lets you create and manage a common vocabulary of terms important to your organization to ensure clear communication and improve productivity. These terms can be categorized in a way that is relevant to your organization. Multiple glossaries can be developed, interlinked, searched and explored. Valid definitions of values and business rules can then be managed and made available across the organization.
Glossary terms can be a ‘flat’ alphabetical list or they can be organized into hierarchies. In TopBraid EDG, the primary user interface (UI) for viewing and editing glossary terms looks like a spreadsheet. When working with glossaries, users can also switch into a hierarchical view and pick any defined relationship between terms to present them as a taxonomical tree. TopBraid EDG also supports a category of vocabularies that are called “taxonomies”. Depending on your licensing, you may see this category of asset collections in your installation of EDG and, as a result, may wonder about the difference between glossaries and taxonomies and when to use each – especially, if your terms are organized hierarchically. Unlike glossaries, taxonomies always assume that terms are organized and presented as hierarchies and that the hierarchical relationship between them is “broader concept” which is defined by the SKOS standard. SKOS doesn’t use the word “term” that is common to business glossaries. Instead, it uses the word “concept”.
Even more importantly, glossaries are designed to improve understanding of data’s context and usage. Glossary terms not only have descriptions of their meaning, but they also define business context of use and can be linked to the underlying technical metadata to provide a direct association between business terms and data sources and data elements. In TopBraid EDG, glossary terms include description of business rules and permissible values – both, in plain English as well as in structured, executable rules that are used to automate connections between data elements and business terms. They may also connect to reference datasets and enumerations that hold lists of values specific to a given term such as “customer status”.
Taxonomies, on the other hand, describe some domain of knowledge in general. They are often focused on providing a rich set of synonyms that are used in search, text mining and document classification.
Main Classes of a Glossary
|Glossary Term||Business term||Business terms are a subcategory of glossary terms. Business terms are related to business, for example, words and phrases about human resources, finance or business operations. Some business terms and their meaning are very specific or unique to a given organization or a department within an organization. For example, a term ‘high value customer’ may be defined differently by two different organizations.|
|Industry term||Industry terms are a subcategory of glossary terms. Industry terms represent industry terminology that tends to be common across all or most businesses operating in a specific industry. Industry terms often come from standards bodies, consortiums and organizations external to a business.|
|Technical term||Technical terms are a subcategory of glossary terms. Technical terms are typically related to technologies and may be connected to or used to describe organization’s technical assets.|
Overview of Lineage Models
Data lineage describes what happens to data as it goes through diverse processes.
If assets you are governing are connected using relationships that describe data lineage, you will be able to click on a given asset and see its lineage or impact across enterprise by selecting the corresponding option from the visualization menu. Lineage will show specifics of how data flows between data sources and applications to users, in support of business activities and functions, and the enablement of enterprise capabilities. Information is presented using an interactive diagram called LineageGram.
One example of such diagram is shown below where we see business (aka logical) lineage showing all systems that feed into one organization’s CRM Platform in a context of a product registration activity and associated information.
It is fairly common in an enterprise to have many different connections between individual assets where connections belong to different contexts serving different purposes. For example:
- CRM system is fed with customer information not only as a result of product registration, but also as a result of other CRM processes and activities such as marketing or customer support
- Similarly, employee information, for example, may flow between HR related applications in the context of HR processes and it may also flow between applications that are used for customer support or customer acquisition in the context of CRM processes.
These are different enterprise flows. If they were all captured and we did not specify a context of interest, asking for a lineage of a system or a data source or a data element would display not only the dependencies in play for product registration, but also many other different links and feeds making it hard for users to understand lineage as it relates to a given business activity. Seeing this fuller picture can be important, especially for impact analysis, and EDG provides it. However, users analyzing data lineage will often want a focused and bounded exploration of lineage. To support this, EDG lets you create Lineage Models as separate asset collections to capture contextualized lineage relationships. The role of the Lineage asset collections is to contain the context specific relationships between data, applications and other assets. Each collection can store lineage for one or more enterprise flows. It often make sense to keep lineage for related processes (e.g., HR) within the same collection.
EDG also includes an asset type called Lineage Model. It shares the name with the asset collection that is intended to store lineage information, but it is an entity of its own that is created within the collection. Its use is optional and its role is to provide a convenient “starting point” for users who want to explore lineage. A Lineage Model asset is linked to assets that participate in the lineage using one of the relationships that have been designed for this purpose, for example, “uses software executable”, as is shown in the screenshot below
Only the last application or a data source in the lineage chain needs to be linked to a Lineage Model asset – as shown above. When users click on a Linage Model asset collection, they will see the Lineage Model assets presented in a table. They can select one and choose an option to show the Lineage diagram.
With this, we see a full flow for product registration information, from the beginning to the end. As shown below, it does not stop with the CRM Platform, but continues into a Data Warehouse and ultimately a Reporting and Analysis Toolset.
To learn more about the full scope of lineage supported by EDG, what kind of relationships are used to capture lineage information and to understand the capabilities of the LineageGram, click on the Lineage Model link in the blue left hand side navigation bar and navigate to the interactive tutorial by following the link in the “A tutorial explaining EDG’s visualization of Lineage Models can be accessed at this page” text.
Overview of Ontologies
Ontologies are general data models that define classes and their properties–attributes, which store literal values and relationships that capture associations between classes.
Ontologies are special asset collections in EDG because all other asset collections are based on ontologies. In other words, ontologies define schema for data that is contained in other collections.These are either ontologies that are pre-packaged with EDG or, in case of Reference Datasets and Data Graphs, ontologies defined by users of EDG. Even for collections that are based on pre-packaged ontologies, users of EDG will commonly want to do some additional modeling – to extend and modify these ontologies.
Although an ontology can also contain instances of its classes, it is typically best to keep an ontology’s instances in a separate model, such as in a Reference Dataset or a Data Graph. However, because ontologies could contain class instances, this section also includes descriptions of how to create, edit, and delete ontology instances.
TopBraid EDG supports a modular approach to defining ontologies. You can create a large single ontology model or you can create more granular ontology modules. These modules can include each other as needed. Using modules allows more granular access control. It also allows you to differentiate between entities used across the enterprise and entities used in particular parts of the business and govern them accordingly.
Overview of Requirements Assets
Requirements contain specifications such as data requirements, regulatory requirements and security requirements and support traceability for other EDG-managed assets.
Overview of Taxonomies
A taxonomy is a vocabulary collection based on SKOS, the W3C standard ontology designed for representing taxonomies, thesauruses, and subject heading schemes.
In EDG, Taxonomies are SKOS-based datasets, whereas other datasets typically do not include SKOS, although they may. When a new taxonomy is created, EDG will automatically include SKOS ontology. SKOS provides description of concepts and their properties e.g., fields like preferred and alternative labels, various notes and relationships.
Edit application for taxonomies is designed to offer SKOS-based features, such as displaying a SKOS model’s Concept Hierarchy (where concepts are connected by SKOS “broader” statements) starting with the defined Concept Schemes as hierarchical roots. Every taxonomy must have at least one concept scheme that identifies “top concepts” in a scheme. A taxonomy can have multiple concept schemes; however, a better practice can be to use a single concept scheme per individual taxonomy. Since taxonomies can, as all asset collections, be included into each other, a taxonomy with multiple concept schemes can be assembled through inclusion of several single concept scheme taxonomies.
Use of the pre-defined SKOS fields can be configured locally for a given taxonomy, globally for the entire EDG or for a subset of taxonomies e.g., all taxonomies associated with some business area. You can disable fields for all or a group of taxonomies by taking advantage of the Ontology modeling functionality in EDG. Ontologies are also used to define custom, non-SKOS properties for taxonomy concepts and to define specializations (sub classes of concepts). If you customize SKOS, you need to make sure that the ontology with your customizations is included in the taxonomies these customizations apply to. There is a system-wide setting your EDG administrator can use to ensure that the customized model is included in every new taxonomy. If customization only applies to a subset of taxonomies, creator of a taxonomy can use the Includes dialog to include the desired customized model.
Overview of Technical Assets
Technical Assets support the specification of systems, applications and technical hardware like servers and networks.