|
|
Getting Started
|
|
|
===============
|
|
|
|
|
|
This short tutorial provides a brief overview over the Myriad internals.
|
|
|
|
|
|
|
|
|
1. Bootstrap Your Project
|
|
|
-------------------------
|
|
|
|
|
|
To bootstrap the development of a new *Myriad*-based data generator project, please follow these steps.
|
|
|
|
|
|
First, create the root folder for the new project (let's call it *my-datagen*) and setup the initial folder structure. Assuming you want to use git to version the source code of your generator, the recommended way to do this is to check out the [myriad-toolkit](https://github.com/TU-Berlin-DIMA/myriad-toolkit) project as a git submodule:
|
|
|
|
|
|
```bash
|
|
|
my_datagen=my-datagen #name your data generator
|
|
|
mkdir $my_datagen
|
|
|
cd $my_datagen
|
|
|
mkdir vendor
|
|
|
git init
|
|
|
git submodule add git://github.com/TU-Berlin-DIMA/myriad-toolkit.git vendor/myriad-toolkit
|
|
|
```
|
|
|
|
|
|
The *Myriad Toolkit* comes with a standard command line assistant tool available under `vendor/myriad-toolkit/bin/assistant`. This tool greatly simplifies the implementation process by providing support for common development tasks. As you probably end up using the CLI tool a lot (especially if you intend to develop a new generator from scratch), we suggest creating a soft link to it under the project root:
|
|
|
|
|
|
```bash
|
|
|
ln -s vendor/myriad-toolkit/bin/assistant myriad-assistant
|
|
|
```
|
|
|
|
|
|
Take a look at the list of tasks supported by the assistant by calling it without any options or arguments:
|
|
|
|
|
|
```bash
|
|
|
./myriad-assistant
|
|
|
```
|
|
|
|
|
|
The first common task that can be handled by the assistant is the initialization of an empty new project:
|
|
|
|
|
|
```bash
|
|
|
./myriad-assistant initialize:project --ns=MyDataGen $my_datagen
|
|
|
```
|
|
|
|
|
|
This will create the basic structure of a new generator called *my-datagen* project and will use the C++ namespace *MyDataGen* as a default namespace for all C++ library extensions. When the task is complete, you will see two new directories (`build` and `src`) as well as several other files in your root folder. The input parameters for the *initialize:project* task are stored under *my-datagen/.myriad-settings* and will be used as default values for all other tasks (e.g. `compile:prototype`).
|
|
|
|
|
|
|
|
|
2. Specify the Data Generator Program
|
|
|
-------------------------------------
|
|
|
|
|
|
The *Myriad* toolkit promotes a general-purpose data generation model centered around the generation of pseudo-random sequences of user defined domain types. To fully specify a *Myriad* data generator, the user must provide a family of *domain types* and an associated family of *pseudo-random domain type generators (PRDGs)*. At runtime, the PRDG functions are applied iteratively to generate the pseudo-random sequences of the corresponding domain types.
|
|
|
|
|
|
The specification can be implemented at one of two possible levels - as a high-level *XML specification of a data generator prototype*, or directly at the code level in one of the C++ classes extending the *Myriad runtime library*. The XML layer is ideal for rapid prototyping and probably sufficient for simple relational use-cases, whereas code level extensions are useful when tailor-made data generating logic is required.
|
|
|
|
|
|
When the *Myriad prototype compiler* is invoked for the first time, it will generate three groups of C++ sources:
|
|
|
|
|
|
* (A) a family of domain types (located under *src/cpp/record*),
|
|
|
* (B) an associated family of PRDG functions (also called *setter chains*, located under *src/cpp/runtime/setter*), and
|
|
|
* (C) a generator configuration that reads the domains and distributions of the values required by the PRDG functions (located under *src/cpp/config*).
|
|
|
|
|
|
All sources are generated as a pair consisting of a main class and a corresponding base class located in the *base* sub-folder. All logic derived from the XML specification is contained in the base classes, while the main classes can be used as extension points by overriding specific base-class methods. Subsequent invocations of the compiler will not touch already existing main classes, which means that users can modify and re-compile the XML specification even after adding custom logic at the code level. Code-level extensions therefore present not an alternative, but rather a complementary way to specify your data generator programs. In order to keep the specification structure clear, we advise users to always use the XML specification as much as possible and fall back to code level extensions only when they are absolutely necessary.
|
|
|
|
|
|
You can find out more about the XML dialect supported by the *Myriad compiler* in the [XML Specification Reference Manual](/TU-Berlin-DIMA/myriad-toolkit/wiki/XML-Specification-Reference-Manual).
|
|
|
|
|
|
|
|
|
3. Build the Data Generator Binary
|
|
|
----------------------------------
|
|
|
|
|
|
Before you start the build process, make sure you have updated your build configuration. To do so, run the *./configure* tool or type
|
|
|
|
|
|
```bash
|
|
|
./configure --help
|
|
|
```
|
|
|
|
|
|
to see more information about the supported build options. The *./configure* script will create a file named *makefile.defs* that will contain all build variables. To start the build process, go to the *build* folder and type the
|
|
|
|
|
|
```bash
|
|
|
make all
|
|
|
```
|
|
|
|
|
|
command. If you want to deploy the compiled data generator to a separate folder, type
|
|
|
|
|
|
```bash
|
|
|
make install
|
|
|
```
|
|
|
|
|
|
after the compilation is finished. This will copy the contents of the *build/bin*, *build/config* and *build/lib* folders into a new folder *my-datagen* located under *MYRIAD_INSTALL_DIR* (check *makefile.defs* for the current *MYRIAD_INSTALL_DIR* value). |
|
|
\ No newline at end of file |