Skip to content
Snippets Groups Projects
Verified Commit 0817cbab authored by Joshua Balthasar Kobschätzki's avatar Joshua Balthasar Kobschätzki :speech_balloon:
Browse files

chore: add notes

parent 2ab048d4
No related branches found
No related tags found
No related merge requests found
......@@ -76,5 +76,34 @@ Notes:
## Bullet point outline
1. Structure of the compiler
1. Compiled into boxed trait
1.
1. Compiles into boxed trait
2. E: Passes
3. IR is lowered into Operators
1. SIMD may be choosen
4. J: Benchmarks,
1. CPP Reference with GCC 11 and -march=native
2. outliers were filtered out, represents the average over 10 runs
3. resolution on x-axis different, harness limitations
4. systems (Intel Sapphire Rapids, AMD GENOA and Intel Icelake)
5. Sponsored by ZIB
6. comparison saphire rapids, 2-3x, larger datasets up to 8x. Mostly overhead of SIMD for small sized (unpacking)
7. gpu-pvc:
8. Icelake same perf trend with slight downturn for SIMD operators due to microarch.
```
Q1.1 adaptive 1.80
cpp-reference 4.71
Q1.1 adaptive +0.3
cpp-reference +0.1
@ 2.2 adaptive speedup of 2x due to group_by_col is not well suited for vectorization.
```
5. E: Simple vs Adaptive
6. J: Learning outcomes
- High Level SIMD ergonomic but not optimal. Speedup on large datasets but quite copy heavy
- not full access to sets (portable, no shuffle / compress)
- Intricinsics would be significantly faster
- Microarch differences
- Tried for part, and it worked but didn't finish for deadline
- Data structures ONC vs Chunked ONC. Unfit for vecotrization.
- Cost of unpacking = 10-20%
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment