|
|
## About Oligos
|
|
|
|
|
|
*Oligos* is a profiling tool that extracts a generator specification from a given reference database, using the stored database statistics. Oligos collects database statistics such as table and column cardinality, most frequent values, distribution statistics and many more, and uses this information to create a dense description of the profiled database. This information is then used to create a *Myriad* XML Specification.
|
|
|
*Oligos* is a profiling tool that extracts a *Myriad* data generator specification from the available meta-information for a reference database (you can think of the reference database as a *gold standard* that the *Myriad* data generator should approximate). To do that Oligos collects schema information, as well as database statistics such as table and column cardinality, most frequent values, distribution statistics, and converts this information to create a *Myriad* XML Specification.
|
|
|
<!--
|
|
|
For more information about Oligos go to the [project homepage](https://bitbucket.org/carabolic/oligos).
|
|
|
-->
|
|
|
|
|
|
## Using Oligos with Myriad
|
|
|
## Using Oligos from the Myriad Assitant Tool
|
|
|
|
|
|
One way to generate a *Myriad* XML Specification is to use *Oligos* and a reference database. In order to do so you need to run the `compile:oligos` task. The only prerequisite to run the `compile:oligos` task is the specific JDBC driver for your database. There are two ways to add the JDBC driver to your *Myriad* project:
|
|
|
In order to do so you need to run the `compile:oligos` assistant tool task from your *Myriad* project. The only prerequisite to run the `compile:oligos` task is to configure the path to the specific JDBC driver for your database (currently, the only database that we support is DB2, but in the future we plan to add support for other widely used databases). There are two ways to configure the path to your JDBC driver:
|
|
|
|
|
|
* add the path to the `$CLASSPATH` environment variable, or
|
|
|
* add the path to the `MYRIAD_OLIGOS_CP` property in the `$PROJECT-HOME/.myriad-settings` file.
|
|
|
* set the path to the `MYRIAD_OLIGOS_CP` property in the `.myriad-settings` file of your *Myriad* project.
|
|
|
|
|
|
After setting the path to the specific JDBC driver you can start using *Myriad* with *Oligos*. The basic syntax of the `compile:oligos` assistant task is a follows
|
|
|
Once you have done that, you can start using *Oligos* from your *Myriad* project! The basic syntax of the `compile:oligos` task is a follows:
|
|
|
|
|
|
```
|
|
|
myriad-assistant compile:oligos -h [host] -P [port] -D [database] -u [username] -p [password] [schema]
|
|
|
myriad-assistant compile:oligos -h [HOST] -P [PORT] -D [DATABASE] -u [USER] -p [SCHEMA_SPEC]
|
|
|
```
|
|
|
|
|
|
Where `host` is the hostname of your database, `port` is the database port, `database` is the name of the database, and `username` and `password` are the credentials used for authentication. The `schema` states which columns should be profiled and has the following syntax (Backus Naur Form):
|
|
|
The task has the following (optional) parameters:
|
|
|
|
|
|
* `-h,--host HOST` - the hostname of your database,
|
|
|
* `-P,--port PORT` - the database port,
|
|
|
* `-D,--database DATABASE` - the name of the database,
|
|
|
* `-u,--username USERNAME` - the name the database user, and
|
|
|
* `-p` - a boolean flag indicating that the specified database user requires a passport for authentication.
|
|
|
|
|
|
The `SCHEMA_SPEC` parameter is a sequence of schema specifications that defines which schemata, tables, and columns should be profiled by the Oligos task. The `SCHEMA_SPEC` parameter has the following syntax (given in EBNF):
|
|
|
|
|
|
```bnf
|
|
|
SCHEMA = SCHEMA_DEFINITION { "," SCHEMA_DEFINITION }
|
|
|
SCHEMA_DEFINITION = SCHEMA_ID ["(" TABLE_SEQUENCE ")"]
|
|
|
TABLE_SEQUENCE = TABLE_DEFINITION { "," TABLE_DEFINITION }
|
|
|
TABLE_DEFINITION = TABLE_ID [ "(" COLUMN_SEQUENCE ")" ]
|
|
|
SCHEMA_SEQUENCE = SCHEMA_DEF { "," SCHEMA_DEF }
|
|
|
SCHEMA_DEF = SCHEMA_ID [ "(" TABLE_SEQUENCE ")" ]
|
|
|
TABLE_SEQUENCE = TABLE_DEF { "," TABLE_DEF }
|
|
|
TABLE_DEF = TABLE_ID [ "(" COLUMN_SEQUENCE ")" ]
|
|
|
COLUMN_SEQUENCE = COLUMN_ID { "," COLUMN_ID }
|
|
|
```
|
|
|
|
|
|
A schema consists of at least a `SCHEMA_ID` which is the name of the schema you want to profile. A `SCHEMA_ID` is followed by an optional sequence of `TABLE_DEFINITIONs` enclosed in parentheses and separated by comma. Each `TABLE_DEFINITION` in turn contains a mandatory `TABLE_ID` (read: table name) and an optional sequence of `COLUMN_IDs`. Omitting the `TABLE_SEQUENCE` or `COLUMN_SEQUENCE` clause is interpreted as wildcard and all tables (resp. columns), are profiled.
|
|
|
A schema specification consists of at least one `SCHEMA_ID`, which is the name of the schema you want to profile. Optionally, a `SCHEMA_ID` is followed by a sequence of table definitions enclosed in parentheses and separated by comma. Each `TABLE_DEFINITION` in turn contains a mandatory `TABLE_ID` and an optional sequence of `COLUMN_IDs`. Omitting the `TABLE_SEQUENCE` or `COLUMN_SEQUENCE` clause is interpreted as a wildcard to profile all tables (resp. columns).
|
|
|
|
|
|
<!-- graphical schema description
|
|
|
```
|
... | ... | @@ -48,27 +56,25 @@ SCHEMA_B, |
|
|
|
|
|
## Examples
|
|
|
|
|
|
Take a look at the following examples that illustrate the concrete syntax of the `compile:oligos` task.
|
|
|
|
|
|
### Reference Database
|
|
|
|
|
|
All of the following examples use the TPCH Schema. The schema is as follows:
|
|
|
Take a look at the following examples of some concrete calls of the `compile:oligos` task. All of the examples use the schema of the TPC-H benchmark. The schema is as follows:
|
|
|
|
|
|
<center>![TPCH Schema](https://www.ki.informatik.hu-berlin.de/wbi/teaching/archive/sose04/fosem/tpch-schema.png/image)</center>
|
|
|
|
|
|
It is assumed that the database is running on localhost on port 60000. The name of the database is TPCH and the default schema is DB2INST1. Furthermore the username is DB2INST1 and the password is mypass.
|
|
|
We assume that the database is running on localhost on port 60000, the name of the database is `TPCH`, and both the username and the default user schema are `DB2INST1`.
|
|
|
|
|
|
The first example will generate a data generator specification only for the `O_ORDERDATE` and `O_TOTALPRICE` columns from the `ORDERS` table:
|
|
|
|
|
|
### Profile O_ORDERDATE and O_TOTALPRICE
|
|
|
```bash
|
|
|
myriad-assistant compile:oligos -h localhost -P 60000 -D TPCH -u DB2INST1 -p mypass DB2INST1 (ORDERS (O_ORDERDATE, O_TOTALPRICE))
|
|
|
myriad-assistant compile:oligos -h localhost -P 60000 -D TPCH -u DB2INST1 -p "DB2INST1(ORDERS(O_ORDERDATE,O_TOTALPRICE))"
|
|
|
```
|
|
|
The next example will generate a specification for the `ORDERS` and `CUSTOMER` tables using all columns in these two tables:
|
|
|
|
|
|
### Profile ORDERS and CUSTOMER (complete)
|
|
|
```bash
|
|
|
myriad-assistant compile:oligos -h localhost -P 60000 -D TPCH -u DB2INST1 -p mypass DB2INST1 (ORDERS, CUSTOMER)
|
|
|
myriad-assistant compile:oligos -h localhost -P 60000 -D TPCH -u DB2INST1 -p mypass "DB2INST1 (ORDERS, CUSTOMER)"
|
|
|
```
|
|
|
|
|
|
### Profile the entire TPCH Database
|
|
|
The third example gives the most general schema specification, where Oligos will consider all tables and all columns located in the `DB2INST1`:
|
|
|
|
|
|
```bash
|
|
|
myriad-assistant compile:oligos -h localhost -P 60000 -D TPCH -u DB2INST1 -p mypass DB2INST1
|
|
|
myriad-assistant compile:oligos -h localhost -P 60000 -D TPCH -u DB2INST1 -p mypass "DB2INST1"
|
|
|
``` |
|
|
\ No newline at end of file |