TopBraid EDG lets you load metadata and data from external sources such as spreadsheets, RDBMs, DDL files, JSON, RDF files, XML, RSS feeds, SPARQL endpoints and other formats – with flexible mapping to EDG models.

Pre-built import options are available from the Import page (tab) for an asset collection or a workflow (working) copy of an asset collection. These options are described below. They will vary depending on the type of collection. You will only see the Import tab and the corresponding page if you have edit permissions.

You may see more import options than described here. You may also see less:

  • Additional importers can be configured using Import modules that are part of TopBraid EDG’s platform. If your organization configures a new import to appear on the Imports page, you will see it listed as one of the available options alongside with the pre-built importers.
  • Your organization may also deactivate some or all of the Import page options. This decision can be made on a per collection basis on the Manage page (tab). In this case, you will see less import operations than described here.

In addition to being available on the Import page, all import operations can be executed as services.

Dynamic imports from the Linked Data sources, including refreshing of the previously loaded values, are executed using options available on the Transform tab and the corresponding page.

Import of documents and associated metadata into Corpora asset collections is performed dynamically according to the connector option selected when creating a corpus.

Imports described here are effectively “pull” operations. In other words, EDG is pulling data through various mechanisms and connectors. EDG is an open solution and a very powerful alternative is for external systems to “push” data to EDG. This can be done by updating EDG through its GraphQL and SPARQL Endpoints.

Audit Trail for Import Operations

When importing directly into an asset collection:

The Record each new triple in change history checkbox is available for each import operation. It gives you the option of recording each individual change in the change history. Note that this left unchecked by default and is not recommended when importing large amounts of data. When this option is left unchecked, the change history will contain a record capturing that an import was executed but will not contain all individual additions and deletions of data.

Further, using the Manage page (tab) for an asset collection, you can activate Record Triple Counts only option. If activated, the change history will only record the numbers of added and deleted triples instead of the details about each triple. This can significantly reduce the size of the change history. This choice has precedence (is it true?) over selecting the Record each new triple in change history during a specific import operation.

When the import is done as part of a workflow:

Every change will always be added as an entry in the change history, where it will be available to all the relevant reports. Decision not to retain this detailed history upon completion of the workflow could be made when designing a workflow template.

Further, using the Manage tab, you can activate Archive Working Copies on Commit option. If activated, detailed change history for committed working copies will be automatically archived. Note that this has precedence over the Record Triple Counts only option, i.e. the archive will contain the full history before being compacted.

File Import Reports. To see the history of all file imports for a collection select the report tab and File Import Reports. This also shows any current running file imports. 

 

Import Tab

 

Operations on the Import tab let you load data into the given asset collections from external sources such as RDF files, spreadsheets, etc. You will only see the Import tab and its operations if you have edit privilege.

 

Import RDF File

 

Any asset collection can import data from an external RDF file (in a serialized format). The Import > Import RDF File link shows a screen where the Choose File button opens a dialog for picking the external source file.

 

Choose the source file and identify its format. Your file can be compressed. Then, chose among the following options:

  • When importing, you can decide whether to record new triples in the change history (use with care!). If importing into a Working Copy, history will always be recorded and this option is greyed out.
  • You may also be able to select a streaming import.  This option is available only to users with the Manager permission for a given asset collection and EDG Administrators. It is not available for import into Working Copies.
  • You can also validate a file without importing it. This validation will be performed not just for the data in the file, but for the combined data – what is already in EDG together with what would be loaded into EDG should this file be imported – because some problem would become apparent only for the merged data.

Click Finish to complete the the operation. A message will indicate whether the import was successful. For large imports, this process make take minutes. Please check the status on the reports tab for File History Report

Note that if your file contains any schema definition (classes, properties, shapes), you will only be able to import it into an Ontology. If you have both instances and schema data in one file, either split the file before import or follow instructions in Copy or Move Instances from Other Asset Collection.

When importing RDF files into an Ontology or into a Taxonomy, EDG will perform some transformations (unless the streaming import is chosen):

  • For ontologies, “subclass of Thing” statements will be added for classes that have no parents. This is done to ensure that these classes are visible in the Class Hierarchy.
  • For taxonomies,  “narrower concept” relationships will be used to generate inverse “broader concept” relationships. This is done to ensure that such concepts are visible in the Concept Hierarchy.

When importing RDF into a Working Copy, the addition of each triple will be recorded as an entry in the change history, where it will be available to all the relevant reports. When importing into a Production Copy, the Record each new triple in change history checkbox gives you the option of adding these to the change history; note that this is not recommended when importing large amounts of data.

As mentioned above, Managers and Administrators will have an option to directly stream RDF into the collection. This allows content to be imported much quicker and use less memory. This should only be used for large imports if the user is confident they do not need to do validation or clean up on the data. It’s best to perform a backup of the workspace prior to importing with direct streaming or use a workflow. Should anything go wrong then, you can revert.

Import Spreadsheet using Template

 

Any asset collection except for Crosswalks, Content Tagsets and Corpora can import data from a spreadsheet using a template. Clicking on this option lets you pick a spreadsheet and a template that will be used to convert the spreadsheet data into reference data. The template may be created using the mapping process explained in the Import Spreadsheet using Pattern section, below.

The mapping can also be created using TopBraid Composer when the simple mapping described above is insufficient and you need to perform more complex transformations–for example, concatenation of values. TopBraid Composer’s SPINMAP tool provides a drag-and-drop interface that makes it especially easy to create more complex mappings.

Templates developed with TopBraid Composer must be stored in files with “.tablemap.” in their name (for example, myMapping.tablemap.ttl) and be uploaded to the EDG server to be available to EDG users. Spreadsheet imported using a template must have exactly the same structure as the spreadsheet used to develop the template. The names and order of the columns must be exactly the same. If multiple worksheets are used, the order and structure of each worksheet (even for worksheets that are not imported) must be the same.

 

Import Spreadsheet using Pattern

 

Any asset collection except for crosswalks, content tagsets and corpora can import data from a spreadsheet using a pattern. To use this option, the worksheet you will import, must have a header row with the names of columns. EDG will find the first row with data and will assume that it is the header column.

The Import > Import Spreadsheet Using Pattern link shows the following screen:

 

Click Choose File to pick the spreadsheet file with data you want to import. This may be an Excel file (file-type extensions: .xls or .xlsx), a tab-separated value (.tsv) file, or a comma-separated value (.csv) file. The file-name should have the expected extension. Because an Excel file may have more than one sheet of data, this screen lets you specify a sheet index value to identify which sheet to read in. The default is 1, for the first sheet.

The sheet index counts all sheets in an Excel workbook, including hidden ones. For example, if you enter a 3 here and EDG seems to import the second sheet, there may be a hidden one between the first and second sheet that made the third one look like it was the second one when Excel was displaying the workbook. The Excel online help explains how to check for the existence of hidden sheets.

The Entity type for the imported data field identifies what type of assets you will be importing. Each row in the spreadsheet will be brought into EDG as an instance of the selected class and you will be able to map spreadsheet columns to the declared properties of the class. Make the (1) file, (2) sheet index, and (3) entity type (class) selections and click  Next .

Note: All assets you are importing will be given the same type. If you need to import assets of different type, you will need to either import the same spreadsheet multiple times – with different mappings OR bulk edit data after import.

 

Select Spreadsheet Type

 

This view enumerates five possible (column-wise) layout patterns for the spreadsheet’s row-item data, showing an example of each pattern.  

The No Hierarchy layout (#1) is the most common and simplest to use. We recommend for users to become familiar with it first, before using other patterns.

Note: You can use the No Hierarchy import pattern to create relationships in the imported data including taxonomic/hierarchical relationships. You can indicate which column will be used to find already existing hierarchical parents. If parents do not exist prior to the import, then you will need to import them first – without mapping information about children. With this approach you will likely need to run the import multiple times using different mappings.

For data explicitly structured into a hierarchy, like a taxonomy, there are four “hierarchical” layout options. The main difference between these patterns and the No Hierarchy pattern is that for the No Hierarchy pattern, EDG, when looking for the hierarchical parents of the items being imported,  will use as parents only resources that already exist in the asset collection. When using hierarchical patterns, EDG will connect children to the hierarchical parents identified in a spreadsheet even if they did not exist prior to the import, creating the parents. In other words, children and parents can be imported and connected in a single import.

In the hierarchical layouts, each row item also indicates its hierarchical path, either explicitly (absolute path, #2, #3, #4) or implicitly (recursive path #5); note that lighter text in the layout patterns indicates optional data). These are complex import patterns that behave differently depending on the URI construction method set for your asset collection. Depending on your URI construction method, you may not be able to use them.

 

Note the header row of column labels in every layout. The imported spreadsheet should always have a header row.

 

Below the five layout options, the view will show a sample of the spreadsheet’s actual data.  The following image shows a spreadsheet of airport codes.

Select the layout title link that most correctly corresponds to the structure of your spreadsheet.

 

Import No Hierarchy Spreadsheet

 

This view will let you define the data-mapping rules from the spreadsheet’s columns into the class. You will also be able to (1) preview the import, (2) save its settings as a template for future imports, and (3) initiate the import processing.

You will see two sections:

  • Column Mappings
  • Unique Identifiers

You will also see several options below the Unique Identifiers section. And, as on the previous page, for convenience, you will see an example of your spreadsheet’s data.

When working with Taxonomies, you will have an option to select an already existing Concept Scheme. If you are importing into a Taxonomy and do not select a concept scheme, EDG will create one for you using the name of a spreadsheet. All concepts that do not have a broader parent nor are defined as top concepts of some scheme, will be made into top concepts of the selected or automatically created scheme.

 

Column Mapping Section

 

The Column Mapping section lets you specify which spreadsheet columns correspond to which properties in the target assets. The target properties you will be able to select are those that belong to the target entity type (class or asset type) you have identified on the previous page. When column names are close to the property names, EDG will automatically propose the mapping. When mapping to relationships, you will be able to indicate inverse relationships to the target entities.

Any unmapped columns will be ignored during the import, their data will not be imported.

The following example shows an example Column Mapping.

 

 

When importing attribute values, the mapped spreadsheet column should contain cell-values that match the datatype of the target attribute. If, as part of the mapping, you select an optional Language value, import will add the selected language tag.  

If the imported rows will create any new instances, as opposed to only adding data to existing ones, then one of spreadsheet columns should be mapped to the label property, which is used as the display name.

When importing relationships EDG needs to understand which existing resources to build a link to. Alternatively, it sometimes could create a URI as a value of the relationship even if a resource with the URI does not yet exist in the asset collection.

After you select a relationship from the dropdown, you will have the following methods for directing EDG on how to build relationship values:

  1. If the related class has a designated primary key constraint, no additional information is needed. EDG will use the data in the mapped column to form the URIs of related resources according to the primary key definition. This option is demonstrated in the above screenshot – airport country is a relationship from Airports to Countries and our ontology defines a primary key for the class Country
    • This means that values in the mapped column must correspond to the property used as the key e.g., a primary key for countries
    • In our example, Country has a primary key that uses 2 character country codes. Thus, we mapped the Country Code column in the spreadsheet to the “airport country” relationship and did not need to provide any additional information as long as this spreadsheet column contains 2 character country codes.
    • Imported rows will always construct a reference, regardless of whether referenced resources already exist in EDG
  2. If all the values in the mapped column are recognized as valid URLs, then they can be used “as-is” as URIs of related resources, as indicated by the associated Use values as URIs label.
  3. If neither option 1 nor option 2 fits your situation, you will see an option to select a property of the related class to match on. This property will be used to find resources at the end of the relationship.
    • For example, if Country did not have a primary key, we could map either Country or Country Code to the “airport country” relationship.
    • If we mapped the Country column, we would then select the “short name” as the property which values to use to identify the related resources.
    • If we mapped the Country Code column, we would select “ISO 3166-2 alphabetic country code” as the property which value to use to identify the related resources.
    • Unlike the first two options, this method will create a relationship ONLY if it finds already existing resource it can connect to.
    • Values of the matching property must be unique across all referenceable instances. If duplicate values are found, then the related resource will be assigned arbitrarily.

Option 3 is demonstrated in the screenshot below – after removing the primary key from the class Country.

 

For inverse relationships, the spreadsheet column represents links from instances of some other class to the resources that are being imported. Similar to forward (non primary key) relationships, if an inverse relationship is the chosen mapping, then there is a further choice of which referring-class property to use to identify the referring instances.

Note: 

  • As explained below under Other Parameters, if the target of a relationship has the same entity type as the entity type chosen for the imported data AND you are using matching on any property values to build a relationship (in other words, using Option 3), the Override existing values option must be unchecked. Otherwise, the relationship will not be created.
  • When importing into Reference Datasets, one of spreadsheet column must map to the primary-key property that is designated for the dataset’s main entity class. For example, the screen image above identifies this field as the IATA code.
  • When importing into Taxonomies, you will be able to select a concept scheme that should contain imported concepts. Otherwise, EDG will create a new concept scheme and make all concepts that have no parents in the spreadsheet its top concepts.

Unique Identifiers

This section tells EDG how to generate URIs for the imported data.

If a class you are importing into has a primary key constraint, you will not need to make any selections in this section. Instead, make sure that you use the Column Mapping section to map one of the spreadsheet columns to the primary key property.

Otherwise, you will need to make some selections. Available selections depend on the URI construction method used for a given asset collection.

If the URI construction method is based on Labels, you will be able to specify the column(s) to be used to generate URI for each imported row – as shown below.

Typically, you will map a column containing the label to the Id column #1, leaving the rest of mappings emptyHowever, if desired, you could also select a different column to generate URIs or even a combination of columns. When multiple columns are selected, EDG will concatenate their values to form URIs.

You will also see a field for the namespace to be used to generate URI for each imported row – Start of URIs. It is pre-populated using the default namespace of the collection, but can be changed.

When adding new information to already existing resources, you must make URI choices that will match URIs of your pre-existing resources.

If the URI Construction method of the asset collection is based on either the Counter or UUID, you will get a different set of options – as shown below.

If you are creating new resources, leave the selection empty, EDG will generate URIs for you according to the currently selected URI construction method. It will use the default namespace for the URIs.

If you adding information to resources that already exist in EDG:

  • You can, in your spreadsheet, have a column that contains URIs previously assigned by EDG and select this column as the URI column.
  • Alternatively, you can match on some property to find resources that already exist. Values of this property must be unique for the entity type. If duplicate values are found, assignments will happen arbitrarily.

Other Parameters

These are located directly below the Unique Identifiers section.

Selecting Overwrite existing values will delete an existing value for a mapped property before adding its new (different) value; otherwise, new values will be added to existing ones.

If the imported rows are adding new data values to existing instances and/or adding new instances, it is best to make sure that you uncheck the Override existing values option. Checking this option has the following consequences:

  • If an instance already exists and has a value for any of the mapped columns, the value will be replaced with data coming from a spreadsheet.
  • Relationships between instances of the same type that rely on matching of values will not be created (because these values may be overridden as part of the processing).
  • When working with Taxonomies, a combination of checked Override existing values and the No Hierarchy pattern will always make imported instances top concepts of a new Concept Scheme, even if they already exist in the Taxonomy and have parent concepts.

Selecting Record each new triple in change history (warning: not recommended for large files) prevents EDG from recording the addition of each new triple in the change history. Note that this option will not let you build relationships to assets that are members of the same class as the one you are importing – because these values can be replaced as part of the import.

Preview button on the Import Spreadsheet form lets you see the RDF triples that would be generated with the currently configured settings. The browser’s Back button returns to the form.

The optional: Make this a reusable mapping template field lets you save all of the settings on this form so that you can later import other spreadsheets with a similar layout without filling out this form. Once you use this field to name a template of settings, if you later pick Import Spreadsheet using Template on the Import tab instead of Import Spreadsheet using Pattern, you’ll see a drop-down list of the saved template names.

When you are satisfied with the sample data shown on the preview screen, click the Finish button. EDG will start the import, running it in the background.

Import Using Hierarchical Spreadsheet Patterns

After you select one of the hierarchical patterns, you will see three sections:

  • Column Mappings
  • Hierarchy
  • Unique Identifiers

And, as on the previous page, for convenience you will see an example of your source data.

You will also see several options below the Unique Identifiers section. However, if your URI patterns are based on the counter or UUIDs, you will not see an option to specify columns to use as URIs. EDG will always assign URIs for you.

This means that when your asset collection uses the counter or UUID construction methods, these importers CAN NOT be used to overwrite existing hierarchies – you can only use them to create new resources:

  • If you need to add information to the previously imported hierarchies, use No Hierarchy import.
  • If you have an existing hierarchy and want to add to it a child tree consisting of new resources, you can use these importers. However, the top of the new tree will not be connected to an already existing parent. You will need to add this connection post import.

This aside, most of the options on the hierarchical import pages are the same as those described  for the No Hierarchy pattern – read the relevant descriptions above under the No Hierarchy Spreadsheet pattern. This section of the guide focuses on the unique aspects of the hierarchy mapping.

For all the hierarchical patterns, you get to select a Hierarchy Property – for example, “has broader”. This is the property that will connect items in your hierarchy. Note: All hierarchical levels will be connected using the same relationship. If you want to have different relationships between levels, use the No Hierarchy pattern.

The Generate in inverse direction checkbox will reverse the direction of how the property specified in Hierarchy property is applied.

When working with Taxonomies, you will also have an option to select an already existing Concept Scheme. If you are importing into a Taxonomy and do not select a concept scheme, EDG will create one for you using the name of a spreadsheet. All concepts that do not have hierarchical parents will be made into top concepts of the selected or automatically created scheme.

Path with Separator Pattern

This pattern works ONLY if your URI construction  method is Label based.

For Path with Separator spreadsheets, in which a spreadsheet entry such as “World > Europe > France” indicates the hierarchical structure above the term “France”, the Hierarchy mapping section works as follows:

  • Select a column containing the path and type a separator e.g., “>”.
  • Identify the Column containing the last node of each path string
  • In the Column Mapping section, to generate a name for imported resources, make sure to assign some column as the preferred label (in case of Taxonomies) or as the label (for all other asset collections) – to generate labels. This will typically be the same column as the one you selected in the Hierarchy section as containing a last node of the path.
  • Map this column again in the Unique Identifiers section. If you leave this section empty, EDG will use row numbers to generate URIs. Alternatively, you could also use some other column(s) to generate URIs.

Column-based Tree Pattern

If you use this pattern and your URI construction  method is counter or UUID based, make sure that your spreadsheet has a single column containing a name of each resource. When you use Label based URI construction, you do not need such column, EDG will assume that the hierarchy columns contain labels.

For Column-based Trees spreadsheets, the Hierarchy mapping section works as follows: 

  • Specify the top and bottom levels of the hierarchy by picking the first and last columns containing hierarchical levels. All hierarchical columns must be located together – one after another in a spreadsheet.
  • If you mapped a column in the Hierarchy section, DO NOT map it in the Column Mapping section nor the Unique Identifiers section. These sections are used ONLY for mapping columns that do not specify the column based tree.
  • If you are using Label-based URI construction method, EDG will assume that the hierarchy columns contain labels of respective resources. If you are using a different URI construction method, as mentioned above, to generate names for resources, you need to have a separate column (outside of the hierarchy) that contains a label for each resource.
  • If you are using Label-based URI construction method and leave the Unique Identifiers section empty, EDG will values in your hierarchical columns to generate URIs. Only make mappings in this section if you want to override the Label-based approach and use some other values for the URIs.
  • Carefully examine the Column-based Trees sample layout on the Select Spreadsheet Type screen. It is important that each item in the hierarchy has a row of its own. See below for correct and incorrect options.

It is important to remember that, as with all spreadsheet import options, all resources will be imported as members of the same class. In other words, you can’t say that Level 3 represents countries and Level 2 represent continents. They must all have the same type – the one you select on the first page of the import wizard.

 

Path with Fixed-length Segments Pattern

For this pattern, you need a column. called path column, where values are such that removing a string of a fixed length from a value will identify a parent for a resource on that row. For example, if using 2 character segments and Australia has a path column value of 010201, its parent would be on a row with a path column value 0102 and a parent’s parent would be on a row with path column value 01.

In other words, EDG will find a parent by removing from the child’s path column’s value the exact number of characters specified in the segment length. The top most items could have a path column value that is shorter or longer than the segment length e.g., 1 instead of 01.

For Path with fixed-length Segments spreadsheets, the Hierarchy mapping section works as follows: 

  • Specify the column with the path values
  • Specify the length of the segments to use to calculate the parent row
  • In the Column Mapping section, to generate a name for imported resources, make sure to assign some column as the preferred label (in case of Taxonomies) or as the label (for all other asset collections). Otherwise, labels will not be generated.
  • If your URI construction method is Label-based, map this column again in the Unique Identifiers section. If you leave this section empty, EDG will use row numbers to generate URIs. Alternatively, you could also use some other column(s) to generate URIs.

Self-join Pattern

For spreadsheets following the Self-Join pattern, the Hierarchy mapping section works as follows: 

  • Specify the Column containing the parent ids – this column will not necessarily be used to generate URI, it is simply a way to match children and parents
  • Specify the Column containing the child ids – this column will not necessarily be used to generate URI, it is simply a way to match children and parents
  • In the Column Mapping section, to generate a name for imported resources, make sure to assign some column as the preferred label (in case of Taxonomies) or as the label (for all other asset collections). Otherwise, labels will not be generated. Typically, but not necessarily, this will be the column you used as a Column containing the child ids.
  • If your URI construction method is Label-based, map this column again in the Unique Identifiers section. If you leave this section empty, EDG will use row numbers to generate URIs. Alternatively, you could also use some other column(s) to generate URIs.

Import Property Shapes from Spreadsheet Columns 

This feature is available only for Ontologies. It is a replacement for the previously available Import Schema from Spreadsheet option. It is not listed in the Import tab and, instead, can be accessed directly in the editor by selecting a class for which you want to import property definitions.

For more information on this option see Create Property Shapes from Spreadsheet Columns in the Working with Ontologies guide.

The input spreadsheet must have at least two rows. The top row must list properties. Other rows should contain data. 

Import Data Set from Spreadsheet

This feature is available only for Data Assets collections.

It takes a spreadsheet as its input and creates an EDG Data Set instance with Data Set Elements for each of the spreadsheet’s column. Includes data profiling as shown below.

Import Crosswalk from Spreadsheet 

This feature is available only for Crosswalk asset collections.

The input spreadsheet must contain two columns. The first column must contain values necessary to build URIs of resources in the From asset collection. The second column must contain values necessary to build URIs of resources in the To asset collection.

 

Import Concepts from Documents

This feature is available only for Taxonomies.

This import option will let you upload a file. EDG will analyze the file extracting concepts from it. Extracted concepts will be presented for selection in order of their frequency in the document. You will be able to select some of the concepts to include in your taxonomy. Selected concepts will be placed in a scheme that EDG will create to hold new concepts. You will then be able to organize them hierarchically as desired. This option is particularly suitable as a way to seed a new taxonomy. 

Import Single Document

This feature is available only for Corpora.

It lets you upload documents into a corpus one document at a time.

Import From JDBC Connection

Certain asset collections, Datatypes and Data Asset Collections, support additional importing from JDBC Connections and DDL files.

This function serves the same purpose as Import DDL File (below). The difference is that the DDL source is a live connection to a database server, rather than a DDL file.

It has the following parameters:

Model for Datatype Definitions: If you do not have this, just create a new EDG Datatype collection which will automatically be populated with some standard data.

JDBC URL: the connection address-string, which may depend on the database type (e.g., jdbc:mysql://localhost:3306/mydb123),

User Name and Password: a database login with access to the desired tables; note: if the password is already in EDG’s secure storage, it may be omitted,

Database name: Similar to the file usage (below), this identifies the scope of this DDL import operation, the desired database/schema. If this not provided, the import will use the default scope of the user or connection.

Include data statistics: If checked, this will compute statistics summarizing the data contained in each imported entity (table, view) and column. Edit or view the data asset collection and select each asset item to see the details of its resulting statistics.

Include data samples: If checked, this will collect sample rows from each entity (table, view). Edit or view the data asset collection and select each entity table or view to see the sample data.

Maximum number of data samples per table: When including data samples, this is the upper limit of rows to collect from each table for the sample data.

Record each new triple in change history: (NOTE: This option only shows if the Record Triple Count option on the Manage tab is inactive.) If checked, this will record each imported triple in the collection’s history, which is not recommended for large imports.

Schedule Import section lets you schedule imports to run on recurring schedule.

Import DDL File

This importer reads DDL statements (CREATE TABLE, etc.) from a SQL file, and creates corresponding entities in EDG. It is available on two types of asset collections: Data Assets and Datatypes.

When importing into a Data Asset Model, the following entities from the DDL file are created:

  • A Relational Database
  • Any Database Tables defined
  • Any Database Views defined
  • The Database Columns of the tables and views
  • A Physical Data Model that serves as a container for all of the above

Database name: The importer will prefix all entity names with a database name, to distinguish the entities created by importing different databases. If no database name is specified, then the name of the SQL file will be used (e.g., NORTHWIND for northwind.sql). The database name serves a role similar to the Catalog names and Schema names within a database server.

Model for Datatype Definitions: The importer also stores the datatype of each table column. It will re-use existing datatype definitions for previously seen types, and create new ones for the rest. The drop-down tells the importer where to look for datatype definitions and where to import new ones. The options are:

  • Any EDG Datatypes that have been included into the Data Asset (via General > Includes).
  • The Data Asset itself.

To store imported datatype definitions, we recommend using EDG Datatypes rather than storing them in the Data Assets themselves.

SQL compatibility

Due to the wide range of SQL dialects and vendor-specific features, we cannot guarantee that all SQL files can be successfully imported.

Problems can come in two forms:

  1. SQL statements that cannot be parsed: When an input file cannot be parsed, the import process will be aborted and nothing will be imported. An error message will be shown indicating the location in the file where the parse error occurred. It may be possible to manually edit the SQL file to remove the unsupported SQL features.
  2. SQL statements only partially understood: In some cases, the importer will be able to understand the basic intent of a DDL statement, but not some specific parameter or argument to the statement. In this case, it will continue the import, and import whatever was understood. Therefore, imported data should be carefully reviewed to ascertain that all needed information has been imported.

Customizing the DDL import

See the Developer Guide: Post-processing of DDL Imports, for details on customizing the DDL import results.

Validate and/or Import Standard Lineage Spreadsheet

This option is only available for Lineage Model asset collections.

This imports Lineage Model contents (data transformations) from a CSV file in the format described here. Each CSV row represents one data transformation step with three logical parts: (1) a source input to an executable, (2) the executable itself, and (3) a target output. Source inputs might be database columns, files, etc. Executables might be stored procedures or scripts. Target outputs might be other columns, reports, etc. An example CSV template is provided in the EDG workspace project: edg.topbraidlive.org\\1.0\\templates\\import\\ImportStandardLineageSpreadsheet.csv.

Optionally, the transformation steps can be identified for specific pipeline steps. A second example template with pipeline steps is provided in project: edg.topbraidlive.org\\1.0\\templates\\import\\ImportStandardLineageTemplateWithPipelines.csv.

The CSV columns are as follows. Note that not all columns are used in all situations. See subsequent sections for details.

(0) Pipeline columns (optional) Description
PIPELINE_NAME This identifies a set of related pipeline steps
PIPELINE_STEP 1, 2, 3, …
(1) Input Source columns Description ( * indicates required fields)
SOURCE_DATABASE_TYPE * Supported values: DB2, MSSQL, NETEZZA, ORACLE, TERADATA, FILE SYSTEM, HIVE.
Any other value is treated as a custom object; see Custom Sources and Targets (below) for details.
SOURCE_SERVER_NAME If applicable, hostname of the …_DATABASE_TYPE
SOURCE_DATABASE_NAME if applicable, the database (server instance) or directory path
SOURCE_DATABASE_SCHEMA if applicable, the database/schema (persistent data)
SOURCE_DATASET_NAME Name of the data element’s table/view/synonym/directory-structure, etc.
SOURCE_DATASET_TYPE Type of the data element’s table/view/synonym/directory-structure, etc.
SOURCE_DATA_ELEMENT_NAME Name of the specific element/column of the dataset that the executable uses
(2) Executable columns Description
EXECUTABLE_SYSTEM_TYPE All executables have an executable system type. For example, if the executable is a stored procedure or a trigger (etc.) the executable system type might be ORACLE, MSSQL, TERADATA, PLSQL but it can be also IFPC, MSSQL scripts, Oracle scripts etc. for external scripts and workflows. For more details in scripts or custom executables see the corresponding sections.
EXECUTABLE_SERVER_NAME hostname of the executable
EXECUTABLE_DATABASE_NAME if applicable, the exe’s declaring database (server instance) or script’s file-path
EXECUTABLE_SCHEMA_NAME if applicable, the executable’s database/schema (persistent data)
EXECUTABLE_NAME * the program, script, stored procedure, job, etc.
EXECUTABLE_TYPE *

Allowed values:

  • SCRIPT
  • SOFTWARE FUNCTION
  • ETL JOB
  • STORED PROCEDURE
  • SOFTWARE MACRO
  • SOFTWARE TRIGGER
  • VIEW
  • SYNONYM
  • INFORMATICA SESSION
  • CUSTOM TRANSFORM
(3) Output Target columns Description ( * indicates required fields)
TARGET_DATABASE_TYPE * Supported values: DB2, MSSQL, NETEZZA, ORACLE, TERADATA, FILE SYSTEM, HIVE.
Any other value is treated as a custom object;see Custom Sources and Targets (below) for details.
TARGET_SERVER_NAME If applicable, hostname of the …_DATABASE_TYPE
TARGET_DATABASE_NAME if applicable, the database (server instance) or directory path
TARGET_DATABASE_SCHEMA if applicable, the database/schema (persistent data)
TARGET_DATASET_NAME Name of the data element’s table/view/synonym/directory-structure, etc.
TARGET_DATASET_TYPE Type of the data element’s table/view/synonym/directory-structure, etc.
TARGET_DATA_ELEMENT_NAME Name of the specific element/column of the dataset that the executable uses

Detailed Examples for Sources and Targets

The properties of source and target data may vary depending on their type (as declared in SOURCE_DATABASE_TYPE /  TARGET_DATABASE_TYPE). The following sections indicate the required values and constraints for the different types of data sources or targets.

Relational Database Sources and Targets

(1) Input Source columns (3) Output Target columns Description ( * indicates required fields)
SOURCE_DATABASE_TYPE * TARGET_DATABASE_TYPE * Supported RDB values: DB2, MSSQL, NETEZZA, ORACLE, TERADATA.
Any other value is treated as a custom type;see Custom Sources and Targets (below) for details. for details.
SOURCE_SERVER_NAME TARGET_SERVER_NAME Optional: Hostname of the corresponding …_DATABASE_TYPE
SOURCE_DATABASE_NAME * TARGET_DATABASE_NAME * Name of the database (server instance)
SOURCE_DATABASE_SCHEMA TARGET_DATABASE_SCHEMA Optional:  Name of the database/schema (persistent data)
SOURCE_DATASET_NAME * TARGET_DATASET_NAME * Name of the corresponding TABLE, VIEW, or SYNONYM
SOURCE_DATASET_TYPE * TARGET_DATASET_TYPE * Supported RDB values: TABLE, VIEW, or SYNONYM
SOURCE_DATA_ELEMENT_NAME * TARGET_DATA_ELEMENT_NAME * Name of the specific element/column of the TABLE, VIEW, or SYNONYM that the executable uses

Example NeighborGram for relational database I/O

Relational database CSV row:

Corresponding Neighborgram:

File Sources and Targets

(1) Input Source columns (3) Output Target columns Description ( * indicates required fields)
SOURCE_DATABASE_TYPE * TARGET_DATABASE_TYPE * Supported value: FILE SYSTEM
SOURCE_SERVER_NAME TARGET_SERVER_NAME Hostname of the file if applicable
SOURCE_DATABASE_NAME * TARGET_DATABASE_NAME * Directory path where the file resides
SOURCE_DATABASE_SCHEMA TARGET_DATABASE_SCHEMA NOT APPLICABLE
SOURCE_DATASET_NAME TARGET_DATASET_NAME The name of the file
SOURCE_DATASET_TYPE TARGET_DATASET_TYPE Supported value: FILE
SOURCE_DATA_ELEMENT_NAME TARGET_DATA_ELEMENT_NAME Column of the file (1,2,3 etc.)

Example NeighborGram for a file output

An example SCRIPT has an input from a database column and an output to a column in a file:

Custom Sources and Targets

(1) Input Source columns (3) Output Target columns Description ( * indicates required fields)
SOURCE_DATABASE_TYPE * TARGET_DATABASE_TYPE *

Any unsupported value is considered to be a custom Data Container, and the source element is considered a Custom element.

NOTE: this field is required, so if the custom object does not belong to a specific container, the field can have value: Custom Data Container

SOURCE_SERVER_NAME TARGET_SERVER_NAME NOT APPLICABLE
SOURCE_DATABASE_NAME TARGET_DATABASE_NAME NOT APPLICABLE
SOURCE_DATABASE_SCHEMA TARGET_DATABASE_SCHEMA NOT APPLICABLE
SOURCE_DATASET_NAME TARGET_DATASET_NAME Level hierarchy if exists without the last level that is stored in the ELEMENT ( eg. type-level5/type-level4/type-level3/type-level2)
SOURCE_DATASET_TYPE TARGET_DATASET_TYPE Type of the custom Element if Exists. If the type is an existing class in EDG then the custom object will be created as an instance of this type. In all cases, the custom object will be type of CustomObject
SOURCE_DATA_ELEMENT_NAME TARGET_DATA_ELEMENT_NAME Name of the specific element/column of the dataset that the executable uses

Detailed Examples for Executables

Executables can either reside in relational databases (e.g., stored procedures, functions, triggers) or be (external) scripts. Also, whenever the row describes the creation of a view, synonym or a custom target, then the executable is correspondingly VIEW, SYNONYM, or CUSTOM TRANSFORM. The following sections describe different executable types.

Relational Database Executables

NOTE: To declare the create View / Synonym function in a database, the EXECUTABLE_TYPE is VIEW or SYNONYM.

Executables (for Relational Databases)
EXECUTABLE_SYSTEM_TYPE Supported RDB values: DB2, MSSQL, NETEZZA, ORACLE, TERADATA, HIVE .
EXECUTABLE_SERVER_NAME if applicable, hostname of the script
EXECUTABLE_DATABASE_NAME The executable’s (stored procedure’s) database (server instance)
EXECUTABLE_SCHEMA_NAME if applicable, the executable’s (stored procedure’s) database/schema (persistent data)
EXECUTABLE_NAME * name of the script
EXECUTABLE_TYPE *

Allowed values:

  • SOFTWARE FUNCTION
  • ETL JOB
  • STORED PROCEDURE
  • SOFTWARE MACRO
  • SOFTWARE TRIGGER
  • VIEW
  • SYNONYM
  • INFORMATICA SESSION

Script Executables

Executables (for Scripts)
EXECUTABLE_SYSTEM_TYPE When EXECUTABLE_TYPE is SCRIPT, then see Appendix A (below) for the supported script-type values of EXECUTABLE_SYSTEM_TYPE. If the script type is unsupported, then the SCRIPT will be created in EDG without a specific type.
EXECUTABLE_SERVER_NAME if applicable, h ostname of the script
EXECUTABLE_DATABASE_NAME if applicable, the directory structure of the executable. It will be created as a FILE SYSTEM in edg
EXECUTABLE_SCHEMA_NAME NOT APPLICABLE
EXECUTABLE_NAME * name of the script
EXECUTABLE_TYPE * Value: SCRIPT

Custom Executables

Executables are custom when they have CUSTOM TRANSFORM. Conventionally it covers the cases where the target object is custom. In EDG, the executable it is created as a Mapping between the source element and the target custom object

Executables (for Custom executables)
EXECUTABLE_SYSTEM_TYPE This is the executable’s container, which is often the same as the target database type, but this is not restricted.
EXECUTABLE_SERVER_NAME NOT APPLICABLE
EXECUTABLE_DATABASE_NAME NOT APPLICABLE
EXECUTABLE_SCHEMA_NAME NOT APPLICABLE
EXECUTABLE_NAME *

Name of the custom transform. We consider it to be

the name of the target custom element

EXECUTABLE_TYPE * Value: CUSTOM TRANSFORM

Example row of a custom executable and related NeighborGram

Executable System Types

Supported values for EXECUTABLE_SYSTEM_TYPE:

  • DB2 SCRIPTS
  • HIVE SCRIPTS
  • IFPC
  • JAVA
  • MSSQL SCRIPTS
  • MSSQL TRANSACT-SQL
  • NETEZZA SCRIPTS
  • ORACLE PL/SQL
  • ORACLE SCRIPTS
  • PLSQL
  • PYTHON
  • SCALA
  • TERADATA BTEQ
  • TERADATA PT
  • TERADATA SCRIPTS

Import TriG file

Import TriG file button can be found on the landing page of any collection type i.e., a page that lists all collections of a given type. For example, Taxonomies,  Glossaries  or Data Assets.

This page allows you to upload a TriG file, such as a TriG file exported using Export > TriG with Includes or Admin > Create TriG file of all EDG production graphs (used for testing).

TriG files contain so-called named graphs. The named graphs that follow the EDG naming convention (urn:x-evn-master:XY) will be turned into asset collections with yourself as the manager. Other named graphs will be turned into Turtle files in the workspace.

Already existing graphs will remain unchanged.

The operation is not permitted if a named graph already exists but you don’t have at least read permission on it, or if there are any triples in the default graph of the TriG file.

    Import Property Definitions (Schema) from a Spreadsheet

    You can use data in a spreadsheet to create property definitions in an ontology. For information on how to do this, see the guide on Ontologies.

      Import Property Definitions (Schema) from a SPARQL Endpoint

      You can use information in the Knowledge Graphs that are external to your EDG installation to create property definitions in an ontology. For information on how to do this, see the guide on Ontologies.