32-Bit Vector Chunk (8-Element) API#

group chunk32_api

Functions

int32_t chunk_s32_dot(const int32_t b[VPU_INT32_EPV], const q2_30 c[VPU_INT32_EPV])#

Compute the inner product between two vector chunks.

This function computes the inner product of two vector chunks, \(\bar b\) and \(\bar c\) .

Conceptually, elements of \(\bar b\) may have any number of fractional bits (int, fixed-point, mantissas of a BFP vector) so long as they’re all the same. Elements of \(\bar c\) are Q2.30 fixed-point values. Given that, the returned value \(a\) will have the same number of fractional bits as \(\bar b\) .

Only the lowest 32 bits of the sum \(a\) are returned.

Operation Performed

\[\begin{aligned} & a \leftarrow \sum_{k=0}^{\mathtt{VPU\_INT32\_EPV}-1} \left( round\left( \frac{b_k\cdot{}c_k}{2^{30}} \right) \right) \end{aligned}\]

Parameters:

b – [in] Input chunk \(\bar b\)
c – [in] Input chunk \(\bar c\)

Returns:

\(a\)

void chunk_s32_log(q8_24 a[VPU_INT32_EPV], const int32_t b[VPU_INT32_EPV], const exponent_t b_exp)#

Compute the natural log of a vector chunk of 32-bit values.

This function computes the natural logarithm of each of the 8 elements in vector chunk \(\bar b\) . The result is returned as an 8-element chunk \(\bar a\) of Q8.24 values.

b_exp is the exponent associated with elements of \(\bar b\) .

Any input \(b_k \le 0\) will result in a corresponding output \(a_k = \mathtt{INT32_MIN}\) .

Operation Performed

\[\begin{split}\begin{aligned} & a_k \leftarrow \ \begin{cases} log(b_k\cdot{}2^{\mathtt{b\_exp}}) & b_k > 0 \\ \mathtt{INT32\_MIN} & \text{otherwise} \\ \end{cases} \\ & \qquad\text{for }k \in {0..\mathtt{VPU\_INT32\_EPV}-1} \end{aligned}\end{split}\]

Parameters:

a – [out] Output vector chunk \(\bar a\)
b – [in] Input vector chunk \(\bar b\)
b_exp – [in] Exponent associated with \(\bar b\)

Throws ET_LOAD_STORE:

Raised if `b` or `a` is not double word-aligned (See Note: Vector Alignment)

void chunk_float_s32_log(q8_24 a[VPU_INT32_EPV], const float_s32_t b[VPU_INT32_EPV])#

Compute the natural log of a vector chunk of float_s32_t.

This function computes the natural logarithm of each of the VPU_INT32_EPV elements in vector chunk \(\bar b\) . The result is returned as an 8-element chunk \(\bar a\) of Q8.24 values.

Any input \(b_k \le 0\) will result in a corresponding output \(a_k = \mathtt{INT32_MIN}\) .

Operation Performed

\[\begin{split}\begin{aligned} & a_k \leftarrow \ \begin{cases} log(b_k) & b_k > 0 \\ \mathtt{INT32\_MIN} & \text{otherwise} \\ \end{cases} \\ & \qquad\text{for }k \in {0..\mathtt{VPU\_INT32\_EPV}-1} \end{aligned}\end{split}\]

Parameters:

a – [out] Output vector chunk \(\bar a\)
b – [in] Input vector chunk \(\bar b\)

Throws ET_LOAD_STORE:

Raised if `b` or `a` is not double word-aligned (See Note: Vector Alignment)

void chunk_q30_power_series(int32_t a[VPU_INT32_EPV], const q2_30 b[VPU_INT32_EPV], const int32_t c[], const unsigned term_count)#

Compute a power series on a vector chunk of Q2.30 values.

This function is used to compute a power series summation on a vector chunk (VPU_INT32_EPV-element vector) \(\bar b\) . \(\bar b\) contains Q2.30 values. \(\bar c\) is a vector containing coefficients to be multiplied by powers of \(\bar b\) , and may have any associated exponent. The output is vector chunk \(\bar a\) and has the same exponent as \(\bar c\) .

c[] is an array with shape (term_count, VPU_INT32_EPV), where the second axis contains the same value replicated across all VPU_INT32_EPV elements. That is, c[k][i] = c[k][j] for i and j in 0..(VPU_INT32_EPV-1). This is for performance reasons. (For the purpose of this explanation, \(\bar c\) is considered to be single-dimensional, without redundancy.)

Operation Performed

\[\begin{split}\begin{aligned} & b_{k,0} = 2^{30} \\ & b_{k,i} = round\left(\frac{b_{k,i-1}\cdot{}b_k}{2^{30}}\right) \\ & \qquad\text{for }i \in {1..(N-1)} \\ & a_k \leftarrow \sum_{i=0}^{N-1} round\left( \frac{b_{k,i}\cdot c_i}{2^{30}} \right) \\ & \qquad\text{for }k \in {0..\mathtt{VPU\_INT32\_EPV}-1} \end{aligned}\end{split}\]

Parameters:

a – [out] Output vector chunk \(\bar a\)
b – [in] Input vector chunk \(\bar b\)
c – [in] Coefficient vector \(\bar c\)
term_count – [in] Number of power series terms, \(N\)

void chunk_q30_exp_small(q2_30 a[VPU_INT32_EPV], const q2_30 b[VPU_INT32_EPV])#

Compute \(e^b\) on a vector chunk of Q2.30 values.

This function computes \(e^{b_k}\) for each element of a vector chunk (VPU_INT32_EPV-element vector) \(\bar b\) of Q2.30 values near \(0\) . The result is computed using the power series approximation of \(e^x\) near zero. It is recommended that this function only be used for \( -0.5 \le b_k\cdot{}2^{-30} \le 0.5\) .

The output vector chunk \(\bar a\) is also in a Q2.30 format.

Operation Performed

\[\begin{split}\begin{aligned} & a_k \leftarrow e^{b_k\cdot{}2^{-30}} \\ & \qquad\text{for }k \in {0..\mathtt{VPU\_INT32\_EPV}} \end{aligned}\end{split}\]

Parameters:

a – [out] Output vector chunk \(\bar a\)
b – [in] Input vector chunk \(\bar b\)