32-bit scalar API#
- group scalar_s32_api
Defines
-
S32_SQRT_MAX_DEPTH#
Maximum bit-depth to calculate with s32_sqrt().
Functions
-
float s32_to_f32(const int32_t mantissa, const exponent_t exp)#
Pack a floating point value into an IEEE 754 single-precision float.
The value returned is the nearest representable approximation to \( m \cdot 2^{p} \) where \(m\) is
mantissa
and \(p\) isexp
.- Example
// Pack -12345678 * 2^{-13} into a float int32_t mant = -12345678; exponent_t exp = -13; float val = s32_to_f32(mant, exp); printf("%e <-- %ld * 2^(%d)\n", val, mant, exp);
Note
This operation may result in a loss of precision.
- Parameters:
mantissa – [in] Mantissa of value to be packed
exp – [in] Exponent of value to be packed
- Returns:
float
representation of input value
-
int16_t s32_to_s16(exponent_t *a_exp, const int32_t b, const exponent_t b_exp)#
Convert a 32-bit floating-point scalar to a 16-bit floating-point scalar.
Converts a 32-bit floating-point scalar, represented by the 32-bit mantissa
b
and exponentb_exp
, into a 16-bit floating-point scalar, represented by the 16-bit returned mantissa and output exponenta_exp
.- Parameters:
a_exp – [out] Output exponent
b – [in] 32-bit input mantissa
b_exp – [in] Input exponent
- Returns:
16-bit output mantissa
-
int32_t s32_sqrt(exponent_t *a_exp, const int32_t b, const exponent_t b_exp, const unsigned depth)#
Compute the square root of a 32-bit floating-point scalar.
b
andb_exp
together represent the input \(b \cdot 2^{b\_exp}\) . Likewise,a
anda_exp
together represent the result \(a \cdot 2^{a\_exp}\) .depth
indicates the number of MSb’s which will be calculated. Smaller values here will execute more quickly at the cost of reduced precision. The maximum valid value fordepth
is S32_SQRT_MAX_DEPTH.- Operation Performed
- \[\begin{aligned} a \cdot 2^{a\_exp} \leftarrow \sqrt{\left( b \cdot 2^{b\_exp} \right)} \end{aligned}\]
- Parameters:
a_exp – [out] Output exponent \(a\_exp\)
b – [in] Input mantissa \(b\)
b_exp – [in] Input exponent \(b\_exp\)
depth – [in] Number of most significant bits to calculate
- Returns:
Output mantissa \(a\)
-
int32_t s32_inverse(exponent_t *a_exp, const int32_t b)#
Compute the inverse of a 32-bit integer.
b
represents the integer \(b\) .a
anda_exp
together represent the result \(a \cdot 2^{a\_exp}\) .- Operation Performed
- \[\begin{aligned} a \cdot 2^{a\_exp} \leftarrow \frac{1}{b} \end{aligned}\]
If \(b\) is the mantissa of a fixed- or floating-point value with an implicit or explicit exponent \(b\_exp\) , then
- Fixed- or Floating-point
\( \begin{aligned} \frac{1}{b \cdot 2^{b\_exp}} &= \frac{1}{b} \cdot 2^{-b\_exp} \\ &= a \cdot 2^{a\_exp} \cdot 2^{-b\_exp} \\ &= a \cdot 2^{a\_exp - b\_exp} \end{aligned} \)
and so \(b\_exp\) should be subtracted from the output exponent \(a\_exp\) .
- Parameters:
a_exp – [out] Output exponent \(a\_exp\)
b – [in] Input integer \(b\)
- Returns:
Output mantissa \(a\)
-
int32_t s32_mul(exponent_t *a_exp, const int32_t b, const int32_t c, const exponent_t b_exp, const exponent_t c_exp)#
Compute the product of two 32-bit floating-point scalars.
a
anda_exp
together represent the result \(a \cdot 2^{a\_exp}\) .b
andb_exp
together represent the result \(b \cdot 2^{b\_exp}\) .c
andc_exp
together represent the result \(c \cdot 2^{c\_exp}\) .- Operation Performed
- \[\begin{aligned} a \cdot 2^{a\_exp} \leftarrow \left( b\cdot 2^{b\_exp} \right) \cdot \left( c\cdot 2^{c\_exp} \right) \end{aligned}\]
- Parameters:
a_exp – [out] Output exponent \(a\_exp\)
b – [in] First input mantissa \(b\)
c – [in] Second input mantissa \(c\)
b_exp – [in] First input exponent \(b\_exp\)
c_exp – [in] Second input exponent \(c\_exp\)
- Returns:
Output mantissa \(a\)
-
sbrad_t radians_to_sbrads(const radian_q24_t theta)#
Convert angle from radians to a modified binary representation.
Some trig functions, such as sbrad_sin(), rather than taking an angle specified in radians (e.g. radian_q24_t), require their argument to be a modified representation of the angle, as an sbrad_t. The modified binary representation takes into account various properies of the \(sin(\theta)\) function to simplify certain operations.
For any angle \(\theta\) there is a unique angle \(\alpha\) where \(-1\le\alpha\le1\) and \(sin(\frac{\pi}{2}\alpha) = sin(\theta)\) . This function essentially just maps the input angle \(\theta\) onto the corresponding angle \(\alpha\) in that region and returns the result in a Q1.31 format.
In this library, the unit of the resulting angle \(\alpha\) is referred to as an ‘sbrad’. ‘brad’ because \(\alpha\) is a kind of binary angular measurement, and ‘s’ because the symmetries of \(sin(\theta)\) are what’s being accounted for.
- Parameters:
theta – [in] Input angle \(\theta\) , in radians (Q8.24)
- Returns:
Output angle \(\alpha\) , in sbrads
-
q2_30 sbrad_sin(const sbrad_t theta)#
Compute the sine of the specified angle.
This function computes \(sin(\frac{\pi}{2}\theta)\) , returning the result in Q2.30 format.
The input angle \(\theta\) must be expressed in sbrads (sbrad_t), and must represent a value between \(\pm 0.5\) (inclusive) (as a Q1.31).
- Operation Performed
- \[\begin{aligned} & sin(\frac{\pi}{2}\theta) \end{aligned}\]
- Parameters:
theta – [in] Input angle \(\theta\) , in sbrads (see radians_to_sbrads)
- Returns:
Sine of the specified angle in Q2.30 format.
-
q2_30 sbrad_tan(const sbrad_t theta)#
Compute the tangent of the specified angle.
This function computes \(tan(\frac{\pi}{2}\theta)\) , returning the result in Q2.30 format.
The input angle \(\theta\) must be expressed in sbrads (sbrad_t), and must represent a value between \(\pm 0.25\) (inclusive) (as a Q1.31).
- Operation Performed
- \[\begin{aligned} & tan(\frac{\pi}{2}\theta) \end{aligned}\]
- Parameters:
theta – [in] Input angle \(\theta\) , in sbrads (see radians_to_sbrads)
- Returns:
Tangent of the specified angle in Q2.30 format.
-
q2_30 q24_sin(const radian_q24_t theta)#
Compute the sine of the specified angle.
This function computes \(sin(\theta)\) , returning the result in Q2.30 format.
- Operation Performed
- \[\begin{aligned} & sin(\theta) \end{aligned}\]
- Parameters:
theta – [in] Input angle \(\theta\) , in radians (Q8.24)
- Returns:
\(sin(\theta)\) as a Q2.30
-
q2_30 q24_cos(const radian_q24_t theta)#
Compute the cosine of the specified angle.
This function computes \(cos(\theta)\) , returning the result in Q2.30 format.
- Operation Performed
- \[\begin{aligned} & cos(\theta) \end{aligned}\]
- Parameters:
theta – [in] Input angle \(\theta\) , in radians (Q8.24)
- Returns:
\(cos(\theta)\) as a Q2.30
-
float_s32_t q24_tan(const radian_q24_t theta)#
Compute the tangent of the specified angle.
This function computes \(tan(\theta)\) . The result is returned as a float_s32_t containing a mantissa and exponent.
The value of \(tan(\theta)\) is considered undefined where \(theta=\frac{\pi}{2}+k\pi\) for any integer \(k\) . An exception will be raised if \(\theta\) meets this condition.
- Operation Performed
- \[\begin{aligned} & tan(\theta) \end{aligned}\]
- Parameters:
theta – [in] Input angle \(\theta\) , in radians (Q8.24)
- Throws ET_ARITHMETIC:
Raised if \(tan(\theta)\) is undefined.
- Returns:
\(tan(\theta)\) as a float_s32_t
-
q2_30 q30_exp_small(const q2_30 x)#
Compute \(e^x\) for Q2.30 value near \(0\) .
This function computes \(e^x\) where \(x\) is a fixed-point value with 30 fractional bits.
This function implements \(e^x\) using a truncated power series, and is only intended to be used for inputs in the range \(-0.5 \le x \le 0.5\) .
The output is also in the Q2.30 format.
For the range \(-0.5 \le x \le 0.5\) , the maximum observed error (compared to
exp(double)
frommath.h
) was2
(which corresponds to \(2^{-29}\) ).For the range \(-1.0 \le x \le 1.0\) , the corresponding maximum observed error was
324
, or approximately \(2^{-21}\) .To compute \(e^x\) for \(x\) outside of \(\left[-0.5, 0.5\right]\) , use
float_s32_exp()
.- Operation Performed
- \[\begin{aligned} & y \leftarrow e^x \end{aligned}\]
- Parameters:
x – [in] Input value \(x\)
- Returns:
\(y\)
-
q8_24 q24_logistic(const q8_24 x)#
Evaluate the logistic function at the specified point.
This function computes the value of the logistic function \(y =\frac{1}{1+e^{-x}}\) . This is a sigmoidal curve bounded below by \(y = 0\) and above by \(y = 1\) .
The input \(x\) and output \(y\) are both Q8.24 fixed-point values.
If speed is greatly preferred to precision,
q24_logistic_fast()
can be used instead.- Operation Performed
- \[\begin{aligned} & y \leftarrow \frac{1}{1+e^{-x}} \end{aligned}\]
- Parameters:
x – [in] Input value \(x\)
- Returns:
\(y\)
-
q8_24 q24_logistic_fast(const q8_24 x)#
Evaluate the logistic function at the specified point.
This function computes the value of the logistic function \(y =\frac{1}{1+e^{-x}}\) . This is a sigmoidal curve bounded below by \(y = 0\) and above by \(y = 1\) .
The input \(x\) and output \(y\) are both Q8.24 fixed-point values.
This implementation trades off precision for speed, approximating results in a piece-wise linear manner. If a precise result is desired,
q24_logistic()
should be used instead.- Operation Performed
- \[\begin{aligned} & y \leftarrow \frac{1}{1+e^{-x}} \end{aligned}\]
- Parameters:
x – [in] Input value \(x\)
- Returns:
\(y\)
-
void s32_to_chunk_s32(int32_t a[VPU_INT32_EPV], int32_t b)#
Broadcast an integer to a vector chunk.
This function broadcasts the input \(b\) to the 8 elements of \(\bar a\) .
- Operation Performed
- \[\begin{aligned} & a_k \leftarrow b \end{aligned}\]
- Parameters:
a – [out] Output chunk \(\bar a\)
b – [in] Input value \(b\)
- Throws ET_LOAD_STORE:
Raised if `a` is not double word-aligned (See Note: Vector Alignment)
-
void q30_powers(q2_30 a[], const q2_30 b, const unsigned N)#
Get the first \(N\) powers of \(b\).
This function computes the first \(N\) powers (starting with \(0\) ) of the Q2.30 input \(b\) . The results are output as \(\bar a\) , also in Q2.30 format.
- Operation Performed
- \[\begin{split}\begin{aligned} & a_0 \leftarrow 2^{30} = \mathtt{Q30(1.0)} \\ & a_k \leftarrow round\left(\frac{a_{k-1}\cdot b}{2^{30}}\right) \\ & \qquad\text{for }k \in {0..N-1} \end{aligned}\end{split}\]
- Parameters:
a – [out] Output \(\bar a\)
b – [in] Input \(b\)
N – [in] Number of elements of \(\bar a\) to compute
-
void s32_odd_powers(int32_t a[], const int32_t b, const unsigned count, const right_shift_t shr)#
Fill vector with odd powers of \(b\) .
This function populates the elements of output vector \(\bar a\) with the odd powers of input \(b\) . The first
count
odd powers of \(b\) are output. The highest power output will be \(2\cdot\mathtt{count}-1\) .The 64-bit product of each multiplication is right-shifted by
shr
bits and truncated to the 32 least significant bits. If \(b\) is a fixed-point value withshr
fractional bits, then each \(a_k\) will have the same Q-format as input \(b\) .shr
must be non-negative.This function neither rounds nor saturates results. It is up to the user to ensure overflows are avoided.
Typical use-case is computing a power series of a function with odd symmetry.
- Operation Performed
- \[\begin{split}\begin{aligned} & b_{sqr} = \frac{b^2}{2^{\mathtt{shr}}} \\ & a_0 \leftarrow b \\ & a_k \leftarrow \frac{a_{k-1},b_{sqr}}{\mathtt{shr}} \\ & \qquad\text{for } k \in {1, 2, 3, ..., \mathtt{count} - 1} \end{aligned}\end{split}\]
- Parameters:
a – [out] Output vector \(\bar a\)
b – [in] Input \(b\)
count – [in] Number of elements to output.
shr – [in] Number of bits to right-shift 64-bit products.
-
S32_SQRT_MAX_DEPTH#