TopBraid EDG lets you load metadata and data from external sources such as spreadsheets, RDBMs, DDL files, JSON, RDF files, XML, RSS feeds, SPARQL endpoints and other formats – with flexible mapping to EDG models.

Pre-built import options are available from the Import page (tab) for an asset collection or a workflow (working) copy of an asset collection. These options are described below. They will vary depending on the type of collection. You will only see the Imports page if you have edit permissions.

You may see more import options than described here. You may also see less:

  • Additional importers can be configured using Import modules that are part of TopBraid EDG’s platform. If your organization configures a new import to appear on the Imports page, you will see it listed as one of the available options alongside with the pre-built importers.
  • Your organization may also deactivate some or all of the Import page options. This decision can be made on a per collection basis on the Manage page (tab). In this case, you will see less import operations than described here.

In addition to being available on the Import page, all import operations can be executed as services.

Dynamic imports from the Linked Data sources, including refreshing of the previously loaded values, are executed using options available on the Transform page (tab).

Import of documents and associated metadata into Corpora asset collections is performed dynamically according to the connector option selected when creating a corpus.

Imports described here are effectively “pull” operations. In other words, EDG is pulling data through various mechanisms and connectors. EDG is an open solution and a very powerful alternative is for external systems to “push” data to EDG. This can be done by updating EDG through its GraphQL and SPARQL Endpoints.

Audit Trail for Import Operations

When importing directly into an asset collection:

The Record each new triple in change history checkbox is available for each import operation. It gives you the option of recording each individual change in the change history. Note that this left unchecked by default and is not recommended when importing large amounts of data. When this option is left unchecked, the change history will contain a record capturing that an import was executed but will not contain all individual additions and deletions of data.

Further, using the Manage page (tab) for an asset collection, you can activate Record Triple Counts only option. If activated, the change history will only record the numbers of added and deleted triples instead of the details about each triple. This can significantly reduce the size of the change history. This choice has precedence (is it true?) over selecting the Record each new triple in change history during a specific import operation.

When the import is done as part of a workflow:

Every change will always be added as an entry in the change history, where it will be available to all the relevant reports. Decision not to retain this detailed history upon completion of the workflow could be made when designing a workflow template.

Further, using the Manage page for an asset collection, you can activate Archive Working Copies on Commit option. If activated, detailed change history for committed working copies will be automatically archived. Note that this has precedence over the Record Triple Counts only option, i.e. the archive will contain the full history before being compacted.

Import View

From any asset collection’s production or working-copy home page, the Import functions lets you load data into the given asset collections from external sources such as RDF files, spreadsheets, etc. You will only see the Import View and its operations if you have edit privilege for the asset collection.

For spreadsheet imports, if URI construction rules are set to Default or Label –
– ‘Label’ field must be chosen manually under ‘Unique Identifiers’ – it does not get set by default
– URIs are generated by choosing up to 3 fields from the spreadsheet to append together
– If trying to import into existing data, URI choice must match previous imports

For spreadsheet imports, if URI construction rules are counter or UUID –
– URIs are automatically generated based on these rules
– If importing into existing data, you will be given the choice to match one column
– If this match column is set, it will attempt to match URI’s based on this field (even it the URIs were generated with a previous rule)
– If ‘tree type’ import is chosen, this match column defaults to skos:prefLabel – with no option given to change it.

Import RDF File

Any asset collection can import data from an external RDF file (in a serialized format). The Import > Import RDF File link shows a screen where the Choose File button opens a dialog for picking the external source file.

Choose the source file and identify its format. Decide whether to record new triples in the change history (use with care!) and then click Finish to complete the import. A message will indicate whether the import was successful.

Note that if your file contains any schema definition (classes, properties, shapes), you will only be able to import it into an Ontology. If you have both instances and schema data in one file, either split the file before import or follow instructions in Copy or Move Instances from Other Asset Collection.

When importing RDF files into an Ontology or into a Taxonomy, EDG will perform some transformations:

  • For ontologies, “subclass of Thing” statements will be added for classes that have no parents. This is done to ensure that these classes are visible in the Class Hierarchy.
  • For taxonomies,  “narrower concept” relationships will be used to generate inverse “broader concept” relationships. This is done to ensure that such concepts are visible in the Concept Hierarchy.

When importing RDF into a Working Copy, the addition of each triple can be added as an entry in the change history, where it will be available to all the relevant reports. When importing into a production copy, the Record each new triple in change history checkbox gives you the option of adding these to the change history; note that this is not recommended when importing large amounts of data.

Import Spreadsheet using Template

Any asset collection except for crosswalks, content tagsets and corpora can import data from a spreadsheet using a template. Clicking on this option lets you pick a spreadsheet and a template that will be used to convert the spreadsheet data into reference data. The template may be created using the mapping process explained in the …using Pattern section, below.

The mapping can also be created using TopBraid Composer when the simple mapping described above is insufficient and you need to perform more complex transformations–for example, concatenation of values. TopBraid Composer’s SPINMAP tool provides a drag-and-drop interface that makes it especially easy to create more complex mappings.

Templates developed with TopBraid Composer must be stored in files with “.tablemap.” in their name (for example, myMapping.tablemap.ttl) and be uploaded to the EDG server to be available to EDG users. Spreadsheet imported using a template must have exactly the same structure as the spreadsheet used to develop the template. The names and order of the columns must be exactly the same. If multiple worksheets are used, the order and structure of each worksheet (even for worksheets that are not imported) must be the same.

Import Spreadsheet using Pattern

Any asset collection except for crosswalks, content tagsets and corpora can import data from a spreadsheet using a pattern. The Import > Import Spreadsheet Using Pattern link shows the following screen:

Click Choose File to pick the spreadsheet file with data you want to import. This may be an Excel file (file-type extensions: .xls or .xlsx), a tab-separated value (.tsv) file, or a comma-separated value (.csv) file. The file-name should have the expected extension. Because an Excel file may have more than one sheet of data, this screen lets you specify a sheet index value to identify which sheet to read in. The default is 1, for the first sheet.

The sheet index counts all sheets in an Excel workbook, including hidden ones. For example, if you enter a 3 here and EDG seems to import the second sheet, there may be a hidden one between the first and second sheet that made the third one look like it was the second one when Excel was displaying the workbook. The Excel online help explains how to check for the existence of hidden sheets.

The Entity type for the imported data field identifies what type of assets you will be importing. Each row in the spreadsheet will be brought into EDG as an instance of the selected class and you will be able to map spreadsheet columns to the declared properties of the class. Make the (1) file, (2) sheet index, and (3) entity type (class) selections and click  Next .

Select Spreadsheet Type

This view enumerates five possible (column-wise) layout patterns for the spreadsheet’s row-item data, showing an example of each pattern.  For data explicitly structured into a hierarchy, like a taxonomy, there are four layout options.  For all other data, there is the No Hierarchy layout (#1). In the hierarchical layouts, each row item also indicates its hierarchical path, either explicitly (absolute path, #2, #3, #4) or implicitly (recursive path, #4, #5; note that lighter text in the layout patterns indicates optional data).

Note the header row of column labels in every layout. The imported spreadsheet should have such a header row.

Below the five layout options, the view shows a sample of the spreadsheet’s actual data.  This following image shows a spreadsheet of airport codes.

Select the layout title link that most correctly corresponds to the chosen spreadsheet’s structure.

Import Spreadsheet

This view defines the data-mapping rules from the spreadsheet’s columns into the class and optional hierarchy structures chosen for the collection data. It also lets users to (1) preview the import, (2) save its settings as a template for future imports, and it (3) initiates the import processing.

Column Mapping

The Column Mapping settings specify which spreadsheet columns correspond to which properties in the target Data Assets. Typically, the target properties belong to the target entity type (class or asset type) you have selected on the previous page. When column names are close to the property names, EDG will automatically propose the mapping.

The mapping also supports inverse relationships to the target entities. Any unmapped columns will be ignored during the import, their data will not be imported.

The following example shows a mapping page presented for the “No Hierarchy” pattern.

When importing attribute values, the spreadsheet column should contain cell-values that matching the datatype of the target attribute. If, as part of the mapping, you select an optional Language value, import will add the selected language tag.

When importing relationships EDG needs to understand which existing resources to build a link to. Alternatively, it can create a URI as a value of the relationship even if a resource with the URI does not yet exist in the asset collection. After you select a relationship from the dropdown, you will have the following methods for directing EDG on how to build relationship values:

  1. If the related class has a designated primary key (PK) constraint, no additional information is needed. EDG will use the data in the mapped column to form the URIs of related resources according to the primary key definition.
    • This means that if the related class has a primary key defined, values in the column must correspond to the property used as the key.
    • For example, in the screenshot above Airport has an “airport country” relationship to Country and Country has a primary key that uses 2 character country codes. Thus, we mapped the Country Code column in the spreadsheet to the “airport country” relationship and did not need to provide any additional information.
  2. If all the values in the column are recognized as valid URLs, then they can be used as-is as URIs of related resources, as indicated by the associated Use values as URIs label.
  3. If neither option 1 nor option 2 fit your situation, you will see an option to select a property of the related class to match on. This property will be used to find resources at the end of the relationship.
    • For example, if Country did not have a primary key, we could map either Country or Country Code to the “airport country” relationship.
    • If we mapped the Country column, we would then select the “short name” as the property which values to use to identify the related resources.
    • If we mapped the Country Code column, we would select “ISO 3166-2 alphabetic country code” as the property which value to use to identify the related resources.
    • Unlike the first two options, this method will only create a relationship if it finds already existing resource it can connect to.

Option 3 is demonstrated in the screenshot below by removing the primary key from the class Country.

For inverse relationships, the spreadsheet column represents links from instances of some other class to the resources that are being imported. Similar to forward (non-PK) relationships, if an inverse relationship is the chosen mapping, then there is a further choice of which referring-class property to use to identify the referring instances.

Even if the referenced property is not designated as a primary key (PK), it is still assumed that all of the corresponding property-values are unique across all referenceable instances. If duplicate values exist, then the referenced instance will be assigned arbitrarily.

Also note that if the related class designates a primary key property or if column values are URLs, then imported rows will always construct a reference, regardless of whether referenced resources exist. On the other hand, if there is no such PK designation, then imported rows will construct a reference only if a matching instance exists.

As explained below, if the target of a relationship has the same entity type as the entity type for the imported data AND you are using matching on the property values to build a relationship, the Override existing values option must be unchecked. Otherwise, the relationship will not be created.

If the imported rows will create any new instances, as opposed to only adding data to existing ones, then one of spreadsheet columns should be mapped to the label property, which is used as the display name.

When importing into Reference Datasets, one of spreadsheet column must map to the primary-key property that is designated for the dataset’s main entity class. For example, the screen image above identifies this field as the IATA code.

When importing into Taxonomies, you will be able to select a concept scheme that should contain imported concepts. Otherwise, EDG will create a new concept scheme and make all concepts that have no parents in the spreadsheet its top concepts.

If the imported rows are adding new data values to existing instances and/or adding new instances, it is best to uncheck the Override existing values option. Checking this option has the following consequences:

  • If an instance already exists and has a value for any of the mapped columns, the value will be replaced with data coming from a spreadsheet.
  • Relationships between instances of the same type that rely on matching of values will not be created (because these values may be overridden as part of the processing).
  • When working with Taxonomies, a combination of checked Override existing values and the No Hierarchy pattern will always make imported instances top concepts of a new Concept Scheme, even if they already exist in the Taxonomy and have parent concepts.

Hierarchical spreadsheet types

If a hierarchical pattern was selected, then there will also be Hierarchy settings that specify how the spreadsheet represents the hierarchical relationships of its data items.

Note that one could still use the No Hierarchy import pattern for a taxonomy. This would import spreadsheet terms as separate concepts without connecting them hierarchically unless you can indicate some property to use to find broader concepts. The taxonomy’s editors could then use the Hierarchical Application View view to manually arrange the imported nodes in the taxonomy’s concept tree.

  • For Path with Separator spreadsheets, make sure to assign at least one column under Column Mapping as the preferred label. For other types, the importer can often infer the preferred label column. (See below about the Preview button.)
  • For Column-based Trees spreadsheets, specify the top and bottom levels of the hierarchy by picking the first and last column names.
  • For Path with fixed-length Segments spreadsheets, specify the column with the path values used as IDs and the length of the segments within the path IDs. In the Path with fixed-length Segments sample layout on the Select Spreadsheet Type screen, the Id column has the path values, and each two-digit segment of these values indicates a step of the hierarchy; removing the last two digits of any of those Id values shows the Id value for that term’s parent. For example, Australia has an Id value of 010201, which has a parent value of Pacific (Id value 0102), which has a parent of World (Id value 01).
  • For Path with Separator spreadsheets, in which a spreadsheet entry such as “World > Europe > France” indicates the hierarchical structure above the term “France”, specify the column storing these values using the Column containing the paths field and the Path separator character in the field with that name. If your spreadsheet also includes an ID column, the Hierarchy section includes a dropdown field to indicate this.
  • For Self-Join spreadsheets, there are columns to specify the Column containing the parent ids and the Column containing the child ids. In the Self-Join sample on the Select Spreadsheet Type screen, these would be the Parent and Term columns, respectively. A Hierarchy Property field also lets you set whether you want the “has broader” property used to identify relationships in the displayed hierarchy or something else.
  • The Generate in inverse direction checkbox will reverse the direction of how the property specified in Hierarchy property is applied.

Unique Identifiers

This section defines the column(s) and the namespace to be used to generate URI for each imported row. Namespace will be pre-populated using the default namespace of the collection.

If URI construction method used for this asset collection is guid or counter, you do not need to select any columns for URI construction. If a class you are importing in has a primary key, you will not need to select any columns either. Instead, make sure to map one of the spreadsheet columns to the primary key property.

If you are using Column-based Tree hierarchical pattern, you will also not need to identify any columns for the URI creation. In case of guid and counter URI construction method, EDG will generate URIs using guids or counter values. In case of the URI construction method based on labels, EDG will expect that the hierarchy columns contain resource labels.

Selecting Overwrite existing values will delete an existing value for a mapped property before adding its new (different) value; otherwise, new values will be added to existing ones.

Selecting Record each new triple in change history (warning: not recommended for large files) prevents EDG from recording the addition of each new triple in the change history. Note that this option will not let you build relationships to assets that are members of the same class as the one you are importing – because these values can be replaced as part of the import.

Preview button on the Import Spreadsheet form lets you see the RDF triples that would be generated with the currently configured settings. The browser’s Back button returns to the form.

The Optional: Make this a reusable mapping template field lets you save all of the settings on this form so that you can later import other spreadsheets with a similar layout without filling out this form. Once you use this field to name a template of settings, if you later pick Import Spreadsheet using Template on the Import tab instead of Import Spreadsheet using Pattern, you’ll see a drop-down list of the saved template names, a Browse button to pick a spreadsheet, and a Finish button to perform the actual conversion.

When you are satisfied with the sample data shown on the preview screen, click the Finish button. EDG will display a message about successful importing of the data along with a set of data that you can use as a mapping file for imports of spreadsheets with a similar structure in the future.

Import Schema from Spreadsheet 

This feature is available only for Ontologies.

The input spreadsheet can contain multiple worksheets or, optionally, you can specify worksheet number. A worksheet must have at least two rows. The top row must list properties. Other rows should contain data. EDG will create a class using the name of the worksheet. It will then create a property for each of the columns. EDG will examine spreadsheet data to select the right datatype for each new property.

Import Crosswalk from Spreadsheet 

This feature is available only for Crosswalk.

The input spreadsheet must contain two columns. The first column must contain values necessary to build URIs of resources in the From asset collection. The second column must contain values necessary to build URIs of resources in the To asset collection. 

 

Import Concepts from Documents

This feature is available only for Taxonomies.

This import option will let you upload a file. EDG will analyze the file extracting concepts from it. Extracted concepts will be presented for selection in order of their frequency in the document. You will be able to select some of the concepts to include in your taxonomy. Selected concepts will be placed in a scheme that EDG will create to hold new concepts. You will then be able to organize them hierarchically as desired. This option is particularly suitable as a way to seed a new taxonomy.

Import MultiTes File

This feature is available only for Taxonomies.

Any taxonomy can import data from an external XML file that was exported from the MultiTes thesaurus management package. The Import > Import MultiTes File link shows a screen where the Choose File button opens a dialog for picking the external source file. Then, clicking the Finish button will produce a message indicating whether the import was successful. EDG will create a concept scheme for the imported MultiTes data based on the name of the host taxonomy. You can rename or rearrange its contents using all the standard EDG features for editing data.

 

Import Single Document

This feature is available only for Corpora.

It lets you upload documents into a corpus one document at a time.

Import From JDBC Connection

Certain asset collections, Datatypes and Data Asset Collections, support additional importing from JDBC Connections and DDL files.

This function serves the same purpose as Import DDL File (below). The difference is that the DDL source is a live connection to a database server, rather than a DDL file.

It has the following parameters:

Model for Datatype Definitions: If you do not have this, just create a new EDG Datatype collection which will automatically be populated with some standard data.

JDBC URL: the connection address-string, which may depend on the database type (e.g., jdbc:mysql://localhost:3306/mydb123),

User Name and Password: a database login with access to the desired tables; note: if the password is already in EDG’s secure storage, it may be omitted,

Database name: Similar to the file usage (below), this identifies the scope of this DDL import operation, the desired database/schema. If this not provided, the import will use the default scope of the user or connection.

Include data statistics: If checked, this will compute statistics summarizing the data contained in each imported entity (table, view) and column. Edit or view the data asset collection and select each asset item to see the details of its resulting statistics.

Include data samples: If checked, this will collect sample rows from each entity (table, view). Edit or view the data asset collection and select each entity table or view to see the sample data.

Maximum number of data samples per table: When including data samples, this is the upper limit of rows to collect from each table for the sample data.

Record each new triple in change history: (NOTE: This option only shows if the Record Triple Count option on the Manage tab is inactive.) If checked, this will record each imported triple in the collection’s history, which is not recommended for large imports.

 Schedule Import section lets you schedule imports to run on recurring schedule.

Import DDL File

This importer reads DDL statements (CREATE TABLE, etc.) from a SQL file, and creates corresponding entities in EDG. It is available on two types of asset collections: Data Assets and Datatypes.

When importing into a Data Asset Model, the following entities from the DDL file are created:

  • A Relational Database
  • Any Database Tables defined
  • Any Database Views defined
  • The Database Columns of the tables and views
  • A Physical Data Model that serves as a container for all of the above

Database name: The importer will prefix all entity names with a database name, to distinguish the entities created by importing different databases. If no database name is specified, then the name of the SQL file will be used (e.g., NORTHWIND for northwind.sql). The database name serves a role similar to the Catalog names and Schema names within a database server.

Model for Datatype Definitions: The importer also stores the datatype of each table column. It will re-use existing datatype definitions for previously seen types, and create new ones for the rest. The drop-down tells the importer where to look for datatype definitions and where to import new ones. The options are:

  • Any EDG Datatypes that have been included into the Data Asset (via General > Includes).
  • The Data Asset itself.

To store imported datatype definitions, we recommend using EDG Datatypes rather than storing them in the Data Assets themselves.

Additional Drivers

Additional drivers, up to three, can be added to your EDG installation. EDG provides jtds-1.3.1, postgresql-42.2.6, ojdbc7, mssql-jdbc-7.2.2.jre8, mariadb-java-client-2.4.2 out of the box. 

1. Stop the application.
2. Rename the driver jar file (Driver1.jar, Driver2.jar, Driver3.jar)
3. Place the jar file(s) in the installation folder /plugin/org.topbraid.lib.jdbc_*
4. Restart the app

 

SQL compatibility

Due to the wide range of SQL dialects and vendor-specific features, we cannot guarantee that all SQL files can be successfully imported.

Problems can come in two forms:

  1. SQL statements that cannot be parsed: When an input file cannot be parsed, the import process will be aborted and nothing will be imported. An error message will be shown indicating the location in the file where the parse error occurred. It may be possible to manually edit the SQL file to remove the unsupported SQL features.
  2. SQL statements only partially understood: In some cases, the importer will be able to understand the basic intent of a DDL statement, but not some specific parameter or argument to the statement. In this case, it will continue the import, and import whatever was understood. Therefore, imported data should be carefully reviewed to ascertain that all needed information has been imported.

Customizing the DDL import

See the Developer Guide: Post-processing of DDL Imports, for details on customizing the DDL import results.

Validate and/or Import Standard Lineage Spreadsheet

The Lineage Model asset collection will have an additional Import option unique for this type of collection. This imports Lineage Model contents (data transformations) from a CSV file in the format described here. Each CSV row represents one data transformation step with three logical parts: (1) a source input to an executable, (2) the executable itself, and (3) a target output. Source inputs might be database columns, files, etc. Executables might be stored procedures or scripts. Target outputs might be other columns, reports, etc. An example CSV template is provided in the EDG workspace project: edg.topbraidlive.org\1.0\templates\import\ImportStandardLineageSpreadsheet.csv.

Optionally, the transformation steps can be identified for specific pipeline steps. A second example template with pipeline steps is provided in project: edg.topbraidlive.org\1.0\templates\import\ImportStandardLineageTemplateWithPipelines.csv.

The CSV columns are as follows. Note that not all columns are used in all situations. See subsequent sections for details.

(0) Pipeline columns (optional) Description
PIPELINE_NAME This identifies a set of related pipeline steps
PIPELINE_STEP 1, 2, 3, …
(1) Input Source columns Description ( * indicates required fields)
SOURCE_DATABASE_TYPE * Supported values: DB2, MSSQL, NETEZZA, ORACLE, TERADATA, FILE SYSTEM, HIVE.
Any other value is treated as a custom object; see Custom Sources and Targets (below) for details.
SOURCE_SERVER_NAME If applicable, hostname of the …_DATABASE_TYPE
SOURCE_DATABASE_NAME if applicable, the database (server instance) or directory path
SOURCE_DATABASE_SCHEMA if applicable, the database/schema (persistent data)
SOURCE_DATASET_NAME Name of the data element’s table/view/synonym/directory-structure, etc.
SOURCE_DATASET_TYPE Type of the data element’s table/view/synonym/directory-structure, etc.
SOURCE_DATA_ELEMENT_NAME Name of the specific element/column of the dataset that the executable uses
(2) Executable columns Description
EXECUTABLE_SYSTEM_TYPE All executables have an executable system type. For example, if the executable is a stored procedure or a trigger (etc.) the executable system type might be ORACLE, MSSQL, TERADATA, PLSQL but it can be also IFPC, MSSQL scripts, Oracle scripts etc. for external scripts and workflows. For more details in scripts or custom executables see the corresponding sections.
EXECUTABLE_SERVER_NAME hostname of the executable
EXECUTABLE_DATABASE_NAME if applicable, the exe’s declaring database (server instance) or script’s file-path
EXECUTABLE_SCHEMA_NAME if applicable, the executable’s database/schema (persistent data)
EXECUTABLE_NAME * the program, script, stored procedure, job, etc.
EXECUTABLE_TYPE *

Allowed values:

  • SCRIPT
  • SOFTWARE FUNCTION
  • ETL JOB
  • STORED PROCEDURE
  • SOFTWARE MACRO
  • SOFTWARE TRIGGER
  • VIEW
  • SYNONYM
  • INFORMATICA SESSION
  • CUSTOM TRANSFORM
(3) Output Target columns Description ( * indicates required fields)
TARGET_DATABASE_TYPE * Supported values: DB2, MSSQL, NETEZZA, ORACLE, TERADATA, FILE SYSTEM, HIVE.
Any other value is treated as a custom object;see Custom Sources and Targets (below) for details.
TARGET_SERVER_NAME If applicable, hostname of the …_DATABASE_TYPE
TARGET_DATABASE_NAME if applicable, the database (server instance) or directory path
TARGET_DATABASE_SCHEMA if applicable, the database/schema (persistent data)
TARGET_DATASET_NAME Name of the data element’s table/view/synonym/directory-structure, etc.
TARGET_DATASET_TYPE Type of the data element’s table/view/synonym/directory-structure, etc.
TARGET_DATA_ELEMENT_NAME Name of the specific element/column of the dataset that the executable uses

Detailed Examples for Sources and Targets

The properties of source and target data may vary depending on their type (as declared in SOURCE_DATABASE_TYPE /  TARGET_DATABASE_TYPE). The following sections indicate the required values and constraints for the different types of data sources or targets.

Relational Database Sources and Targets

(1) Input Source columns (3) Output Target columns Description ( * indicates required fields)
SOURCE_DATABASE_TYPE * TARGET_DATABASE_TYPE * Supported RDB values: DB2, MSSQL, NETEZZA, ORACLE, TERADATA.
Any other value is treated as a custom type;see Custom Sources and Targets (below) for details. for details.
SOURCE_SERVER_NAME TARGET_SERVER_NAME Optional: Hostname of the corresponding …_DATABASE_TYPE
SOURCE_DATABASE_NAME * TARGET_DATABASE_NAME * Name of the database (server instance)
SOURCE_DATABASE_SCHEMA TARGET_DATABASE_SCHEMA Optional:  Name of the database/schema (persistent data)
SOURCE_DATASET_NAME * TARGET_DATASET_NAME * Name of the corresponding TABLE, VIEW, or SYNONYM
SOURCE_DATASET_TYPE * TARGET_DATASET_TYPE * Supported RDB values: TABLE, VIEW, or SYNONYM
SOURCE_DATA_ELEMENT_NAME * TARGET_DATA_ELEMENT_NAME * Name of the specific element/column of the TABLE, VIEW, or SYNONYM that the executable uses

Example NeighborGram for relational database I/O

Relational database CSV row:

Corresponding Neighborgram:

File Sources and Targets

(1) Input Source columns (3) Output Target columns Description ( * indicates required fields)
SOURCE_DATABASE_TYPE * TARGET_DATABASE_TYPE * Supported value: FILE SYSTEM
SOURCE_SERVER_NAME TARGET_SERVER_NAME Hostname of the file if applicable
SOURCE_DATABASE_NAME * TARGET_DATABASE_NAME * Directory path where the file resides
SOURCE_DATABASE_SCHEMA TARGET_DATABASE_SCHEMA NOT APPLICABLE
SOURCE_DATASET_NAME TARGET_DATASET_NAME The name of the file
SOURCE_DATASET_TYPE TARGET_DATASET_TYPE Supported value: FILE
SOURCE_DATA_ELEMENT_NAME TARGET_DATA_ELEMENT_NAME Column of the file (1,2,3 etc.)

Example NeighborGram for a file output

An example SCRIPT has an input from a database column and an output to a column in a file:

Custom Sources and Targets

(1) Input Source columns (3) Output Target columns Description ( * indicates required fields)
SOURCE_DATABASE_TYPE * TARGET_DATABASE_TYPE *

Any unsupported value is considered to be a custom Data Container, and the source element is considered a Custom element.

NOTE: this field is required, so if the custom object does not belong to a specific container, the field can have value: Custom Data Container

SOURCE_SERVER_NAME TARGET_SERVER_NAME NOT APPLICABLE
SOURCE_DATABASE_NAME TARGET_DATABASE_NAME NOT APPLICABLE
SOURCE_DATABASE_SCHEMA TARGET_DATABASE_SCHEMA NOT APPLICABLE
SOURCE_DATASET_NAME TARGET_DATASET_NAME Level hierarchy if exists without the last level that is stored in the ELEMENT ( eg. type-level5/type-level4/type-level3/type-level2)
SOURCE_DATASET_TYPE TARGET_DATASET_TYPE Type of the custom Element if Exists. If the type is an existing class in EDG then the custom object will be created as an instance of this type. In all cases, the custom object will be type of CustomObject
SOURCE_DATA_ELEMENT_NAME TARGET_DATA_ELEMENT_NAME Name of the specific element/column of the dataset that the executable uses

Detailed Examples for Executables

Executables can either reside in relational databases (e.g., stored procedures, functions, triggers) or be (external) scripts. Also, whenever the row describes the creation of a view, synonym or a custom target, then the executable is correspondingly VIEW, SYNONYM, or CUSTOM TRANSFORM. The following sections describe different executable types.

Relational Database Executables

NOTE: To declare the create View / Synonym function in a database, the EXECUTABLE_TYPE is VIEW or SYNONYM.

Executables (for Relational Databases)
EXECUTABLE_SYSTEM_TYPE Supported RDB values: DB2, MSSQL, NETEZZA, ORACLE, TERADATA, HIVE .
EXECUTABLE_SERVER_NAME if applicable, hostname of the script
EXECUTABLE_DATABASE_NAME The executable’s (stored procedure’s) database (server instance)
EXECUTABLE_SCHEMA_NAME if applicable, the executable’s (stored procedure’s) database/schema (persistent data)
EXECUTABLE_NAME * name of the script
EXECUTABLE_TYPE *

Allowed values:

  • SOFTWARE FUNCTION
  • ETL JOB
  • STORED PROCEDURE
  • SOFTWARE MACRO
  • SOFTWARE TRIGGER
  • VIEW
  • SYNONYM
  • INFORMATICA SESSION

Script Executables

Executables (for Scripts)
EXECUTABLE_SYSTEM_TYPE When EXECUTABLE_TYPE is SCRIPT, then see Appendix A (below) for the supported script-type values of EXECUTABLE_SYSTEM_TYPE. If the script type is unsupported, then the SCRIPT will be created in EDG without a specific type.
EXECUTABLE_SERVER_NAME if applicable, h ostname of the script
EXECUTABLE_DATABASE_NAME if applicable, the directory structure of the executable. It will be created as a FILE SYSTEM in edg
EXECUTABLE_SCHEMA_NAME NOT APPLICABLE
EXECUTABLE_NAME * name of the script
EXECUTABLE_TYPE * Value: SCRIPT

Custom Executables

Executables are custom when they have CUSTOM TRANSFORM. Conventionally it covers the cases where the target object is custom. In EDG, the executable it is created as a Mapping between the source element and the target custom object

Executables (for Custom executables)
EXECUTABLE_SYSTEM_TYPE This is the executable’s container, which is often the same as the target database type, but this is not restricted.
EXECUTABLE_SERVER_NAME NOT APPLICABLE
EXECUTABLE_DATABASE_NAME NOT APPLICABLE
EXECUTABLE_SCHEMA_NAME NOT APPLICABLE
EXECUTABLE_NAME *

Name of the custom transform. We consider it to be

the name of the target custom element

EXECUTABLE_TYPE * Value: CUSTOM TRANSFORM

Example row of a custom executable and related NeighborGram

Appendix A: Executable System Types

Supported values for EXECUTABLE_SYSTEM_TYPE:

  • DB2 SCRIPTS
  • HIVE SCRIPTS
  • IFPC
  • JAVA
  • MSSQL SCRIPTS
  • MSSQL TRANSACT-SQL
  • NETEZZA SCRIPTS
  • ORACLE PL/SQL
  • ORACLE SCRIPTS
  • PLSQL
  • PYTHON
  • SCALA
  • TERADATA BTEQ
  • TERADATA PT
  • TERADATA SCRIPTS