|
|
Myriad Data Generator Toolkit
|
|
|
=============================
|
|
|
|
|
|
*Myriad* is a development toolkit for scalable parallel data generators. Generating large sets of synthetic data according to a predefined schema and a set of statistical restrictions is a challenging yet increasingly important task, especially in the context of benchmarking and testing systems designed for management and processing of web-scale data like [Hadoop](http://hadoop.apache.org) or parallel RDBMS like [DB2](http://www-01.ibm.com/software/data/db2/). Myriad aims to ease this process by providing you with a fast and easy way to create your own data generators. All generators created with the toolkit are scale-out ready and support parallelization on shared-nothing architectures.
|
|
|
*Myriad* is a development toolkit for scalable parallel data generators. Generating large sets of synthetic data according to a predefined schema and a set of statistical restrictions is a challenging yet increasingly important task, especially in the context of benchmarking and testing systems designed for management and processing of web-scale data like [Hadoop](http://hadoop.apache.org) or parallel RDBMS like [DB2](http://www-01.ibm.com/software/data/db2/). Myriad aims to ease this process by providing a fast and easy way to create your own data generators. All generators created with the toolkit support parallelization on shared-nothing architectures.
|
|
|
|
|
|
*Myriad* is developed in the context of the [Stratosphere](http://www.stratosphere.eu) project as an ongoing collaboration between the [Database Systems Research Group, TU Berlin](http://www.dima.tu-berlin.de) and the [IBM Center for Advanced Studies, Toronto](https://www-927.ibm.com/ibm/cas/canada/research/index.shtml).
|
|
|
|
... | ... | |