Skip to content
Snippets Groups Projects

PPDS Presentation

Deadline for submission: July 7th, EOD

This repository will contain all code required for the presentation, including code for gathering, processing and plotting benchmark data.

Presentation

Outline

This is a WIP proposal, please change as seen fit

  1. Übersicht
    • Kurz, Arbeitsschritte und Ergebnisse
    • Hierzu criterion mit naive, simple und adaptive compiler aufbauen und Fortschritt in Graphen zeigen
  2. Lernergebnisse
    • Was hat sich durch die Implementation gezeigt,
      • was hat funktioniert?
        • Query Rewriting (Effekt von Fuse und Predicate pass)
      • was nicht?
        • portables SIMD (Miroarch Probleme)
        • ONC

Backup slides

  • Test system specs
  • Interfaces & Constraints from rust

Notes on building

This uses a derivative from my LaTeX flake.

  • to get a continous preview (you need to open the PDF yourself): nix run .#preview
  • to build a final PDF: nix run .#build
  • to compress a PDF from .#build with ghostscript: nix run .#compress
  • to clean up build artifacts: nix run .#clean

You may use it, or your own TeXLive, instance to build the presentation. Please commit an up-to-date build of the presentation as PDF along with any tex changes on each commit.

Benchmarks

We have 2 days to collect and plot the benchmark data, currently outstanding are:

  • what do we want to benchmark? what queries are we evaluating?
    • Effect of SIMD (on/off tests with simple vs adaptive compiler)
    • Effect of query rewriting (a/b test for some queries)
    • Progress in terms of perf, naive -> simple -> adaptive (SIMD)
  • how do we achieve this?
    • Effect of SIMD: tests with simple and adaptive compiler on subset of benchmark queries, e.g., showing worst and best case?
    • Effect of query rewriting:
      • Show difference between fused scans and normal scans
      • Show difference with predicate rewrite and NOP push-up
    • Progress in terms of perf:
      • Hook up benchmark and criterion, run them and aggregate data
  • collect benchmark data:
    • run for each cluster:
      • Cascade Lake AP: standard96:test
      • Genoa: genoa-cpu:all
      • Saphire Rappids: gpu-pvc:all
      • Icelake: gpu-a100:all
    • Postprocess and aggregate
      • extract data from logs
      • parse and dump into some sort of container (polars or duckdb)
    • Plot out aggregated data
      • likely plotly with jupyter notebook, then dump to png's
      • we need to this manually, however we can emulate criterion's plots

Contains the code used for benchmarks, (TBD) notebooks used for gene