Browse Source

Fix comments

al-update-winterfell
Paul-Henry Kajfasz 8 months ago
parent
commit
b1cb2b6ec3
No known key found for this signature in database GPG Key ID: 5EB89DCE97DB7417
1 changed files with 21 additions and 20 deletions
  1. +21
    -20
      src/hash/rescue/arch/x86_64_avx2.rs

+ 21
- 20
src/hash/rescue/arch/x86_64_avx2.rs

@ -5,36 +5,37 @@ use core::arch::x86_64::*;
// Preliminary notes: // Preliminary notes:
// 1. AVX does not support addition with carry but 128-bit (2-word) addition can be easily emulated. // 1. AVX does not support addition with carry but 128-bit (2-word) addition can be easily emulated.
// The method recognizes that for a + b overflowed iff (a + b) < a:
// - i. res_lo = a_lo + b_lo
// - ii. carry_mask = res_lo < a_lo
// - iii. res_hi = a_hi + b_hi - carry_mask Notice that carry_mask is subtracted, not added.
// The method recognizes that for a + b overflowed iff (a + b) < a: i. res_lo = a_lo + b_lo ii.
// carry_mask = res_lo < a_lo iii. res_hi = a_hi + b_hi - carry_mask
//
// Notice that carry_mask is subtracted, not added.
//
// This is because AVX comparison instructions return -1 (all bits 1) for true and 0 for false. // This is because AVX comparison instructions return -1 (all bits 1) for true and 0 for false.
// //
// 2. AVX does not have unsigned 64-bit comparisons. Those can be emulated with signed comparisons // 2. AVX does not have unsigned 64-bit comparisons. Those can be emulated with signed comparisons
// by recognizing that a <u b iff a + (1 << 63) <s b + (1 << 63), where the addition wraps around // by recognizing that a <u b iff a + (1 << 63) <s b + (1 << 63), where the addition wraps around
// and the comparisons are unsigned and signed respectively. The shift function adds/subtracts 1 // and the comparisons are unsigned and signed respectively. The shift function adds/subtracts 1
// << 63 to enable this trick. Addition with carry example:
// - i. a_lo_s = shift(a_lo)
// - ii. res_lo_s = a_lo_s + b_lo
// - iii. carry_mask = res_lo_s <s a_lo_s
// - iv. res_lo = shift(res_lo_s)
// - v. res_hi = a_hi + b_hi - carry_mask The suffix _s denotes a value that has been
// shifted by 1 << 63.
// << 63 to enable this trick. Addition with carry example: i. a_lo_s = shift(a_lo) ii. res_lo_s
// = a_lo_s + b_lo iii. carry_mask = res_lo_s <s a_lo_s iv. res_lo = shift(res_lo_s) v. res_hi =
// a_hi + b_hi - carry_mask
// //
// The result of addition is shifted if exactly one of the operands is shifted, as is the case on
// The suffix _s denotes a value that has been shifted by 1 << 63. The result of addition
// is shifted if exactly one of the operands is shifted, as is the case on
// line ii. Line iii. performs a signed comparison res_lo_s <s a_lo_s on shifted values to // line ii. Line iii. performs a signed comparison res_lo_s <s a_lo_s on shifted values to
// emulate unsigned comparison res_lo <u a_lo on unshifted values. Finally, line iv. reverses the // emulate unsigned comparison res_lo <u a_lo on unshifted values. Finally, line iv. reverses the
// shift so the result can be returned. When performing a chain of calculations, we can often
// shift so the result can be returned.
//
// When performing a chain of calculations, we can often
// save instructions by letting the shift propagate through and only undoing it when necessary. // save instructions by letting the shift propagate through and only undoing it when necessary.
// For example, to compute the addition of three two-word (128-bit) numbers we can do: // For example, to compute the addition of three two-word (128-bit) numbers we can do:
// - i. a_lo_s = shift(a_lo)
// - ii. tmp_lo_s = a_lo_s + b_lo
// - iii. tmp_carry_mask = tmp_lo_s <s a_lo_s
// - iv. tmp_hi = a_hi + b_hi - tmp_carry_mask
// - v. res_lo_s = tmp_lo_s + c_lo vi. res_carry_mask = res_lo_s <s tmp_lo_s
// - vii. res_lo = shift(res_lo_s)
// - viii. res_hi = tmp_hi + c_hi - res_carry_mask
// i. a_lo_s = shift(a_lo)
// ii. tmp_lo_s = a_lo_s + b_lo
// iii. tmp_carry_mask = tmp_lo_s <s a_lo_s
// iv. tmp_hi = a_hi + b_hi - tmp_carry_mask
// v. res_lo_s = tmp_lo_s + c_lo vi. res_carry_mask = res_lo_s <s tmp_lo_s
// vii. res_lo = shift(res_lo_s)
// viii. res_hi = tmp_hi + c_hi - res_carry_mask
//
// Notice that the above 3-value addition still only requires two calls to shift, just like our // Notice that the above 3-value addition still only requires two calls to shift, just like our
// 2-value addition. // 2-value addition.

Loading…
Cancel
Save