Community
Implementer discussion regarding a low-level language for writing SIMD kernels.
Early drafts:
* [0](https://dzaima.github.io/paste/#0dVbNbuM2EL77KWYvWymQZWQbBIs6yaJpLhugCxQBWsA3WqQsOrLokpQt7TY99gF66QP1Tfok/YaUtE6CDRyHQw6/@ftmmL2yB@G0aUhqtxe@qJQl3@@V@2HGf0j4HR1UMaO19hR@VjQj0u@jcAtBn19G4YGF79/hFMJHFi4viLYQ7mmxINPUPemSdOMdiUZSWRvBS6vIqb2wwitKdK5yaoyn218@pTMqGUIC4o7NFoPZHyEUo9nfWGCzBYSfWO0MiwN@f53NZjAsVakb7TlKU5KvFO3a2ut9rQsRdvdW73B@ULMvuJ1ImZFV7vrL8ax7Stn1QjhFx0o1tDa@gsebwe1C1MKSNO26Vm7JPs@d72sVVWDtGCLtMkJiFW8MV0JyERzbu7vLOMcn9iB5Y/VnJQfs4MFyRAu2ow4c3WjnlXUZ7YR71M2GsHuskMwDaqkdVbhTA0m03uwQMByoe7ar9RAnm0Th/v0HXzNOKRfO0jXsYZWfUbcMKmENxCGGy4s5s@J5MuP9khJLb67J5oBMKVg4iJolDgVV94ZNLlEOAOIjqBafewIBi0cqEQIzxVOpoasbGj0jeOxb21C7Z5zEpsG1k5CXpOVjkMj1jRcducq0tYyWKoEqM8xT@A7@MYBtnZ/X@lGN@DXQkdF@rT5A8ynlhG23L4nBieLS5PQJxODkI67oGlZ85/7@ZXFRI8lMEE1j@nCFfUVReFNKuruj7Zbu70OV1e@tqDNat55UBTx8Rj6WIehXnAw3yIE86NIFmnMRc5fc3mb08JDRx48ZGK9i2bHg7KtOFa0HSUxTgKYNOS@sb/chScxV9BAdcwQV0s2rDfrFRR4zz5eIwJnBOpLQQYU5/wzB5yB0QMAiAoiRyAHpqBFKpLwkVaudanyMJWHwfJPT@w4V@@@vv@miu7xIIzyZHdjqcyNlEgnBvcB@YWfECf04mUvYuWzSq80R3VKJugxqwRkJKJ5EruUklUHXjWxao8e/ji3WH1xR7MqfZhfcACGbcC8e5g5Ch/NzurqCeNTSV/PzmBFHG8WNUU1zZUzAgP40MIm1G0NF3Qa3pg5qetoYI0/YwqPTHGEvOdJbMus0n7JvFeawsqg/bH4z7SMG@5x0E8a4zTMC@GdQGPfUZE6xKt3c3IyBTioT2rdVGFkxshqQeaQgwsTYm2vO4R@wfjXHKk3HWXAyBCIJQMNGdZ72obgo6zseJY8ZH/DDAgET@xm8muDVS3iMlTT6YmxeQQPKJ2VMsceprWh@mtyYWxDr8Dq1YarEjsZY8aF99Y5HqfZ4K8O7hVvrfnCBh4UAhdqiQmDoKMmRWCUgi0kd4fOLnlF4HYoWEu4hQLEPUxpFjjaZMCDUHL9xoryyDVLhzQgK/PCwMm/g6ShDZ4gaxuXkaQrqWSPk6CtTmfUHczwfitA8rwzRSi9W28VKLvRqsV0t5Gr2FN7uCvTSLnKcRyTfDDlEoG9JFL7l1@zrfy/yAyYxXgAl3vD1n3mA8zsS7YZ2DJ0r@0bs4lOItmsUd7Ekh2Ihl6FksGG1cienSP2Q9dB1wn/H44tbLhj8Hw%23C) - mostly dispatching, which is unrelated to the language
* [1](https://dzaima.github.io/paste/#0lVTNbuM2EL77KWYvhbxwZKQNgqJKnEvaF1jfih4ocWwRK5EGSUtKt@mxD9DLPtC@SZ@k31A/TbIJFgVkSxwOZ7755htut6S8Vw9fPqfXylxe//obtefm8lqpbFz1Gxo/hg39crZVNM6Ssz977/yaPq1o2vZ0S6ppXKUiZzg0rAvabil6w4GiI8/nwGQO@DhU7mxjSO49mbghF2v2vYHDHCIgMJ6D85T1Ff1EfUGDvIeCvLz9GttEHVcRPp8kV1Ure5RsNdOxcaVqqFXho7FHqpyNPEQBIjBUwI59AJajCZF9EItl1qxTVJIct8JE1yE9qsFvLjrVZSwQmxiWDCECdQF7YyxrCnxSHpbmIdUweymrybp5mXI9pv9QqUZNdYCkaYmCdCPn3IHY6kAmUMkRiFGksikOko0khA2MJvlgy51SpyS58FG59mQaOZcYeFGlMV9XOYF7XK22i0wmWJmUAUjoo089C25CPClp/UxK4ZmU/p@QEg5sdDR02Bjy0julKxVi8VIeoyoWRaxea2P3vMDHhffVa3S8cJ64mCudyQALjR5JKJl6qCKypSNbCAQukwJEbsH8zhQfTryh8hyp5ToRNdIUEk3fJun6KlHUvx8SNQfKPL27JTm8hs/knCXInuPZW/LFBD1ye2rAa0I0EvXl8/he3ex3tJ@42guMgcq3QEgNmg@0z@/pvqA06DBhdTQdJlDNXRAr9SbWpN25bKBVbrhlG0cuMnDG@TGnH4cfvqd//vqbrobrK5nsPbkWVe5zp3U2jpyIXVQHyxwlyHpJlmEkQe3sh8uFZYSaQ3JLUDRCWYfsZ58uI/ENFOrUQnRPLYOb/BMQFiB/urZIIwMg3PE4erKdf8DnAI9LurnBsjc61heXxcLJB0ghjpfSpJiZgikDnnvCRYge9vQduXKd63QcZeUaPTS4u/zJS5Q3qRyDCI5sWIKMRlELwr/H9mjhJReLG@12uxn45LDEectBYrLE5EWFEHjm/O5W2PgDWW8u8LV@KkipqQYS7SCRyUwyIXLD9c5/vJO70WhW7@joogt31Ki21CrcPcnBSw5@Jcd/onc@r@EE/yeNWcMGFPi/eErsyCuk0r2k9fFf%23C) - initial language draft
* [2](https://dzaima.github.io/paste/#0lVLBbtNAEL37K96hiCQyiQAJIauAECcu5BDfqhZt7XG6krNrdnbjxFH@nRnbVJQbOU1m3rx57603G5TnjrjINhvYjzns2w85khRJi@b9uxzr9VqnqxLzj@lXIlcRfIMyR2eYqYZhdN66SEHRd@5@xjf2RPUbtgOhtRxfLlmHQHtpU2Bdu3k@8qeNJrXtuJQp4JvhOKkt8QqnGTqe7QLFafAwDyoBY5Ecm4ZghQK3iGJX6U7LGbuewGxiCiZat0fl3ZFCzLJDao/HS1loRDl2t@V14YqdOMkxFCtx0RerXY6T1EsU0Ak@4ZKJaR/Qy0SA8EIGq1Yd9sGnTl1eADn/gySD6MEdVbY549EnV5tgiaXrM9UlqO@ObU2IT5bBlVctA4yrRbgJpGz9k20JPQRw9@V@Ny4eci9iFjcaR7/EqsBp7EsOXs5LWCk40XUdu4NgD@O1r/JMMSod2VCDrb708JrhPFrv9mLGPx6tT9yeYYSnoaBfg9Ao1czrsuuc30/1/3eIObafXya5/Z8kn0PcjvNJ@kL@TTbnUp73Xz2/AQ%23C) (+ [revision](https://dzaima.github.io/paste/#0jc5BCoMwEAXQfU7xV0VDThBqKfQAWSR7kWIhoBGiRjHN3Tup0NJ200VgMvNneP3chRCN0LhDH00qnNR2awU2yY3AIrkWWKkuIZEnqBAZcBt8PV6bLpqEhRK0gCG0HhbWwT0zoF6FSzNOlCqWEvwcfTvN3sEmrJRI9BxLjPWZUecD2SLUrsEB6vRpUv@Y1Jfph6N2zuuzlm/MAw%23C)) - new syntax, pointers instead of arrays
* [3](https://dzaima.github.io/paste/#0nZDBaoQwEIbvPsV/KiqBpVfTLYU@gAe9lVLi7tiKmrQxGqsVelz6ED32wfZJOrJQutc9hEn4J8M3X9s3wzDnIsMHspt8CXWSVRMJTEmcC/gkzgRGvkdIsCbYYg4A1TtTGovQc8zdEhWcgY4wY7OBFyOU3mOCNh4F7UxLGGjnjO1gSvS65kALFL0DvfWqQUP62b0IvJquq4rmHfuqLMmSdvzfeSKNypFVrjK6Q6hqhfp4@Lo@fv4cD9@SUR/qx0wyLNdc8gJrjZgVjLHFverczAv6CPHdbMn1lifKBaPknoWPDpYgaFchT2Ygu1oR6ckLrpDenttJL7NzzpOeeP4eY/QP5hc#JS) - another syntax proposal for `for`
Type checking, dispatching, and allocation are out of scope: they should be handled by the calling language.
Top Answer
Marshall Lochbaum
Fundamental operations and IR:
I think all the operations a typical user would call, like `+` or `*`, should be defined in a library written in Singeli, possibly auto-loaded. There will only be a small number of built-in functions, and in a sense the frontend is just a very extensible language built around these.
- `result = emit{opcode, result type, arg0, arg1, ...}`
- `result = call{function (value or symbol), arg0, arg1, ...}`
- `slot = declareLocal{type}`
- `handle = beginFunction{arg0 type, arg1 type, ...}`
- `endFunction{handle}`
- `handle = beginGoto{direction}`
- `endGoto{handle, condition(?)}`
(Maybe we want something more sophisticated than gotos. I don't know yet. Maybe I'm forgetting functionality as well).
In addition to their effect on the IR output, these primitives return values to be used in later computation. `result` is a read-only reference to the result of the computation. I think it's easiest to use a static single assignment (SSA) form like LLVM, so that each instruction in the IR of a single unit (function?) is given an ID number, and later instructions can use this number. In this case, a `result` is an ID number, possibly with an additional field to describe in what context it's valid (but maybe we can design our function calls in a way that makes the context unneeded).
Answer #2
Marshall Lochbaum
Evaluation sketch:
```
# Brackets in an assignment target indicate compile-time arguments.
# Arguments can be unbound names (possibly duplicated), or values.
# Conditions on those arguments are written after, with & .
# : declares a variable and => constructs a function.
safe_double{T & T<u64} = (x:T) => {
y:2*T = Cast{2*T, x}
y + y
}
# Overloading
safe_double{f64} = (x:f64) => x+x
'double_u32' = safe_double{u32}
```
In the compiled output `double_u32` should be exported as a single-instruction function (not counting register moves): it adds its argument to itself with 64-bit addition and then returns that value.
What happens when `safe_double{u32}` is evaluated? This happens during compilation. The environment is keeping an ordered list of definitions for safe_double. It tests them in reverse source order to see if one fits the given argument(s): `{f64}` doesn't but `{T & T<u64}` does, since `<` on types means "subtype". This function is then run.
The braces in the definition of `safe_double` don't actually introduce a scope; the scope starts after `=` when a compile-time function is defined. So `x:T` and `y:2*T` occur in the same scope.
`x:T` defines the value `x`, and sets its value to be a compile-time typed variable slot—in the compiled code this will probably be a particular register. The operator `=>` takes a list of these variable slots on the left and source code on the right. Operators always take their arguments at compile time, although an operator might alias to a runtime function (for example `x ** y = fn(x,y)`). I don't know exactly how source blocks should work, but they should be passed unevaluated as arguments so that functions can control how they are evaluated or evaluate them multiple times (for example, for loop unrolling).
What `=>` does is to emit a function header (whatever that looks like in IR), and then just call the source code block. The source code creates a variable slow `y:2*T`, which works the same as the slot for `x`; `2*T` doubles the size of type `T`. Then it calls `Cast{2*T, x}`, which does two things: it *emits* a cast instruction (which should probably exist in the IR, but be eliminated later, as 32-bit unsigned ints should already be stored as 64-bit in x86), and *returns* a handle for the result of this instruction. By passing this handle instead of a symbolic representation of the computation, the compiler ensures that a given piece of source code doesn't get duplicated in the compiled output. Then `=` moves the result (using its handle) into register `y`. Here `=` is doing something very different from the top-level `=`s that define `safe_double`; they are distinguished by the left argument but maybe they should have different symbols. `y + y` is another computation; it emits an add instruction and returns a handle for the result. Finally the function code moves this value to a result register, or otherwise marks it as a result in the IR output.
After `safe_double{u32}` is evaluated, `'double_u32' = ` exports it as a function. The value `'double_u32'` is a symbol, in the C library sense.
Answer #3
Marshall Lochbaum
Overflow handling:
* Default is to wrap
* Saturation
* Get high/low halves
* Call different code if any (masking?) value overflows
Answer #4
Marshall Lochbaum
Ways to convert between types:
* Reinterpret, if they're the same size
* Cast each value (lossy for decreasing size)
* Saturating cast, sometimes
* Replicate entire value to fit a larger type
* Replicate individual values
* Select one element, or a slice
Answer #5
Marshall Lochbaum
Type operations:
* `*T`: sequence of `T`, passed as pointer
* `[n]T`: fixed-size list of `T`, passed in registers
* `$T`: register full of `T`
Answer #6
Marshall Lochbaum
Atomic types:
* `i8`/`i16`/`i32`/`i64`
* `u8`/`u16`/`u32`/`u64`
* `u1`
* `f32`/`f64`