Mixed-precision vector API#
- group vect_mixed_api
Functions
-
void mat_mul_s8_x_s16_yield_s32(int32_t output[], const int8_t matrix[], const int16_t input_vect[], const unsigned M_rows, const unsigned N_cols, int8_t scratch[])#
Multiply an 8-bit matrix by a 16-bit vetor for a 32-bit result vector.
This function multiplies an 8-bit \(M \times N\) matrix \(\bar W\) by a 16-bit \(N\) -element column vector \(\bar v\) and returns the result as a 32-bit \(M\) -element vector \(\bar a\) .
output
is the output vector \(\bar a\) .matrix
is the matrix \(\bar W\) .input_vect
is the vector \(\bar v\) .matrix
andinput_vect
must both begin at a word-aligned offsets.M_rows
andN_rows
are the dimensions \(M\) and \(N\) of matrix \(\bar W\) . \(M\) must be a multiple of 16, and \(N\) must be a multiple of 32.scratch
is a pointer to a word-aligned buffer that this function may use to store intermediate results. This buffer must be at least \(N\) bytes long.The result of this multiplication is exact, so long as saturation does not occur.
- Parameters:
output – [inout] The output vector \(\bar a\)
matrix – [in] The weight matrix \(\bar W\)
input_vect – [in] The input vector \(\bar v\)
M_rows – [in] The number of rows \(M\) in matrix \(\bar W\)
N_cols – [in] The number of columns \(N\) in matrix \(\bar W\)
scratch – [in] Scratch buffer required by this function.
- Throws ET_LOAD_STORE:
Raised if `matrix` or `input_vect` is not word-aligned (See Note: Vector Alignment)
-
unsigned vect_sXX_add_scalar(int32_t a[], const int32_t b[], const unsigned length_bytes, const int32_t c, const int32_t d, const right_shift_t b_shr, const unsigned mode_bits)#
Add a scalar to a vector.
Add a scalar to a vector. This works for 8, 16 or 32 bits, real or complex.
length_bytes
is the total number of bytes to be output. So, for 16-bit vectors,length_bytes
is twice the number of elements, whereas for complex 32-bit vectors,length_bytes
is 8 times the number of elements.c
andd
are the values that populate the internal buffer to be added to the input vector as follows: Internally an 8 word (32 byte) buffer is allocated (on the stack). Even-indexed words are populated withc
and odd-indexed words are populated withd
. For real vectors,c
andd
should be the same value — the reason ford
is to allow this same function to work for complex 32-bit vectors. This also means that for 16-bit vectors, the value to be added needs to be duplicated in both the higher 2 bytes and lower 2 bytes of the word.mode_bits
should be0x0000
for 32-bit mode,0x0100
for 16-bit mode or0x0200
for 8-bit mode.
-
void mat_mul_s8_x_s16_yield_s32(int32_t output[], const int8_t matrix[], const int16_t input_vect[], const unsigned M_rows, const unsigned N_cols, int8_t scratch[])#