Scalar IEEE 754 float API#

group scalar_f32_api

Functions

void f32_unpack(int32_t *mantissa, exponent_t *exp, const float input)#

Unpack an IEEE 754 single-precision float into a 32-bit mantissa and exponent.

Example

// Unpack 1.52345246 * 10^(-5)
float val = 1.52345246e-5;
int32_t mant;
exponent_t exp;
f32_unpack(&mant, &exp, val);

printf("%ld * 2^(%d) <-- %e\n", mant, exp, val);

Parameters:

mantissa – [out] Unpacked output mantissa
exp – [out] Unpacked output exponent
input – [in] Float value to be unpacked

void f32_unpack_s16(int16_t *mantissa, exponent_t *exp, const float input)#

Unpack an IEEE 754 single-precision float into a 16-bit mantissa and exponent.

Example

// Unpack 1.52345246 * 10^(-5)
float val = 1.52345246e-5;
int16_t mant;
exponent_t exp;
f32_unpack_s16(&mant, &exp, val);

printf("%ld * 2^(%d) <-- %e\n", mant, exp, val);

Note

This operation may result in a loss of precision.

Parameters:

mantissa – [out] Unpacked output mantissa
exp – [out] Unpacked output exponent
input – [in] Float value to be unpacked

float_s32_t f32_to_float_s32(const float x)#

Convert an IEEE754 float to a float_s32_t.

Parameters:

x – [in] Input value

Throws ET_ARITHMETIC:

Raised if `x` is infinite or NaN

Returns:

float_s32_t representation of x

float_s32_t f64_to_float_s32(const double x)#

Convert an IEEE754 double to a float_s32_t.

Note

This operation may result in precision loss.

Parameters:

x – [in] Input value

Throws ET_ARITHMETIC:

Raised if `x` is infinite or NaN

Returns:

float_s32_t representation of x

float f32_sin(const float theta)#

Get the sine of a specified angle.

Computes \(sin(\theta)\) using the power series expansion of \(sin()\) truncated to 8 terms.

This implementation is meant to make optimal use of the XS3 floating-point unit.

Parameters:

theta – [in] Angle \(\theta\) to compute the sine of (in radians)

Throws ET_ARITHMETIC:

Raised if \(\theta\) is infinite or NaN

Returns:

Sine of the angle \(\theta\)

float f32_cos(const float theta)#

Get the cosine of a specified angle.

Computes \(cos(\theta) = sin(\theta+\frac{\pi}{2}\) using the power series expansion of \(sin()\) truncated to 8 terms.

This implementation is meant to make optimal use of the XS3 floating-point unit.

Parameters:

theta – [in] Angle \(\theta\) to compute the cosine of (in radians)

Throws ET_ARITHMETIC:

Raised if \(\theta\) is infinite or NaN

Returns:

Cosine of the angle \(\theta\)

float f32_log2(const float x)#

Get the base-2 logarithm of the specified value.

This function computes \(log_2(x)\) using the power series expansion of \(log_2()\) truncated to 11 terms.

Parameters:

x – [in] Input value \(x\) to get the logarithm of.

Throws ET_ARITHMETIC:

Raised if \(x\) is infinite or NaN

Returns:

\(log_2(x)\)

float f32_power_series(const float x, const float b[], const unsigned N)#

Compute power series summation using specified coefficients.

This function is used to compute the sum of terms in a power series, truncated to \(N\) terms, starting with the \(x^0\) term.

b is an \(N\) -element vector of coefficients \(\bar b\) which are multiplied by the corresponding powers of \(x\) .

\(N\) is the length of \(\bar b\) and number of terms to sum together.

Operation Performed

\[\begin{aligned} & a \leftarrow \sum_{k=0}^{N-1}\left( x^k,b_k \right) \end{aligned}\]

Parameters:

x – [in] Input value \(x\) .
b – [in] Vector of coefficients \(\bar b\) .
N – [in] Number of power series terms to sum.

Throws ET_ARITHMETIC:

Raised if \(x\) or any element of \(\bar b\) is infinite or NaN.

Returns:

\(a\) , the sum of the first \(N\) power series terms.

float f32_normA(exponent_t *p, const float x)#

Get a representation of the input \(x\) in normalized form A.

This function is used internally to transform a float value into a representation required for certain purposes.

In particular, this function behaves much like frexpf(), where it is guaranteed that the returned value \(a\) is either \(0\) or that \(0.5 \le \left| a \right| < 1.0\) , and the output exponent \(p\) is such that \(x = a \cdot 2^{p}\) .

In anticipation that future work may require alternative “normalized” representations, this form is being defined here as form A.

Parameters:

p – [in] Output exponent \(p\)
x – [in] Input value \(x\)

Throws ET_ARITHMETIC:

Raised if \(x\) or any element of \(\bar b\) is infinite or NaN.

Returns:

\(a\) in normalized form A.