chore: add notes

0817cbab · Joshua Balthasar Kobschätzki · 2ab048d4 · 0817cbab
Verified Commit 0817cbab authored 8 months ago by Joshua Balthasar Kobschätzki
--- a/notes.md
+++ b/notes.md
@@ -76,5 +76,34 @@ Notes:
 ## Bullet point outline

 1. Structure of the compiler
-1. Compiled into boxed trait
-1.
+   1. Compiles into boxed trait
+2. E: Passes
+3. IR is lowered into Operators
+   1. SIMD may be choosen
+4. J: Benchmarks,
+   1. CPP Reference with GCC 11 and -march=native
+   2. outliers were filtered out, represents the average over 10 runs
+   3. resolution on x-axis different, harness limitations
+   4. systems (Intel Sapphire Rapids, AMD GENOA and Intel Icelake)
+   5. Sponsored by ZIB
+   6. comparison saphire rapids, 2-3x, larger datasets up to 8x. Mostly overhead of SIMD for small sized (unpacking)
+   7. gpu-pvc:
+   8. Icelake same perf trend with slight downturn for SIMD operators due to microarch.
+
+```
+Q1.1 adaptive      1.80
+   cpp-reference 4.71
+Q1.1 adaptive      +0.3
+   cpp-reference +0.1
+@ 2.2 adaptive  speedup of 2x due to group_by_col is not well suited for vectorization.
+```
+
+5. E: Simple vs Adaptive
+6. J: Learning outcomes
+   - High Level SIMD ergonomic but not optimal. Speedup on large datasets but quite copy heavy
+   - not full access to sets (portable, no shuffle / compress)
+   - Intricinsics would be significantly faster
+   - Microarch differences
+   - Tried for part, and it worked but didn't finish for deadline
+   - Data structures ONC vs Chunked ONC. Unfit for vecotrization.
+     - Cost of unpacking = 10-20%