isutf8
function to check whether a string is a valid UTF-8 encoded sequence. The function returns a boolean indicating whether the input conforms to UTF-8 encoding rules.
isutf8
is useful when working with data from external sources such as logs, telemetry events, or data pipelines, where encoding issues can cause downstream processing to fail or produce incorrect results. By filtering out or isolating invalid UTF-8 strings, you can ensure better data quality and avoid unexpected behavior during parsing or transformation.
For users of other query languages
If you come from other query languages, this section explains how to adjust your existing queries to achieve the same results in APL.Splunk SPL users
Splunk SPL users
Splunk doesn’t provide a built-in function to directly check if a string is valid UTF-8. Users typically rely on workarounds using field transformations or regex, which can be error-prone or incomplete. APL provides
isutf8
as a simple and reliable alternative.ANSI SQL users
ANSI SQL users
ANSI SQL does not define a standard function to validate UTF-8 encoding in strings. Some platforms offer vendor-specific functions, but behavior varies. APL offers
isutf8
as a consistent, built-in way to validate string encoding.Usage
Syntax
Parameters
Name | Type | Description |
---|---|---|
value | string | The input string to validate. |
Returns
Abool
value:
true
if the input string is valid UTF-8.false
otherwise.
Use case examples
You can use Run in PlaygroundOutput
This query identifies records where the
isutf8
to detect and exclude malformed UTF-8 entries in HTTP request logs that could indicate issues with upstream data encoding.Query_time | id | method | uri | status |
---|---|---|---|---|
2025-07-09T13:32:05Z | user42 | GET | �/broken-path | 500 |
2025-07-09T14:10:17Z | user99 | POST | /submit-form%80 | 200 |
uri
or method
fields contain invalid UTF-8 characters, which may point to upstream client encoding issues or malformed requests.