Header menu logo FSharp.Data

Type Inference and Missing Values

This page describes the type inference rules used by the FSharp.Data type providers (CSV, JSON, XML and HTML). Understanding these rules helps you know what F# types to expect for each property, and how to handle missing, null, or optional values at runtime.

Overview

All FSharp.Data type providers infer types from a sample document (or a list of samples) at compile time (design time). The generated F# types reflect the structure of the sample. At runtime, any document with a compatible structure can be read — but the generated types are fixed by the sample.

A key principle: the sample should be representative. If a property is present in the sample but absent from runtime data, it can raise a KeyNotFoundException. Conversely, if runtime data contains new properties not in the sample, they are not accessible via the generated type (though they may still be reachable through the underlying JsonValue, XElement, etc.).

Numeric Type Inference

When inferring numeric types, the providers prefer the most precise type that can represent all values. The preference order (most preferred first) is:

  1. int – 32-bit signed integer
  2. int64 – 64-bit signed integer
  3. decimal – exact decimal arithmetic (preferred for financial/monetary values)
  4. float – 64-bit floating point (used when decimal cannot represent the value, or when missing values appear in a CSV column that would otherwise be decimal)

If values in a column or array mix two types, the provider automatically promotes to the wider type. For example, a JSON array [1, 2, 3.14] will produce decimal values.

open FSharp.Data

// int is inferred when all values are integers
type IntsOnly = JsonProvider<""" [1, 2, 3] """>

// decimal is inferred when any value has a fractional part
type WithDecimal = JsonProvider<""" [1, 2, 3.14] """>
type IntsOnly = FSharp.Data.JsonProvider<...>
type WithDecimal = FSharp.Data.JsonProvider<...>

Boolean Inference (CSV)

In CSV files, columns whose values are exclusively drawn from the set 0, 1, Yes, No, True, False (case-insensitive) are inferred as bool. Any other values in the column cause it to be treated as a string.

Date and Time Inference

The providers recognise date and time strings in standard ISO 8601 formats:

Inferred Type

When Used

Example Value

DateTime

Date + time strings (default)

"2023-06-15T12:00:00"

DateTimeOffset

Date + time + timezone offset (always)

"2023-06-15T12:00:00+02:00"

DateTimeOffset

Any date + time string when PreferDateTimeOffset=true

"2023-06-15T12:00:00"

DateOnly (.NET 6+)

Date-only strings when PreferDateOnly=true

"2023-06-15"

TimeOnly (.NET 6+)

Time-only strings when PreferDateOnly=true

"12:00:00"

By default (PreferDateOnly = false), date-only strings such as "2023-06-15" are inferred as DateTime for backward compatibility. Set PreferDateOnly = true on .NET 6 and later to infer them as DateOnly instead.

Set PreferDateTimeOffset = true to infer all date-time values (that would otherwise be DateTime) as DateTimeOffset instead. Values that already carry an explicit timezone offset (e.g. "2023-06-15T12:00:00+02:00") are always inferred as DateTimeOffset regardless of this flag. PreferDateTimeOffset and PreferDateOnly are independent: DateOnly values stay as DateOnly even when PreferDateTimeOffset=true.

If a column mixes DateOnly and DateTime values, they are unified to DateTime.

Missing Values and Optionals

This is the most important topic for understanding how the providers behave at runtime. The rules differ slightly across providers.

JSON Provider

In JSON, a property can be absent from an object, or its value can be null (null literal). Both cases are handled the same way by the JSON type provider:

This means None represents either a missing key or a null value at runtime.

// 'age' is missing from the second record → inferred as option<int>
type People =
    JsonProvider<"""
  [ { "name":"Alice", "age":30 },
    { "name":"Bob" } ] """>

for person in People.GetSamples() do
    printf "%s" person.Name

    match person.Age with
    | Some age -> printfn " (age %d)" age
    | None -> printfn " (age unknown)"
Alice (age 30)
Bob (age unknown)
type People = JsonProvider<...>
val it: unit = ()

Important runtime note: If a property is present and non-null in all samples, it will be inferred as a non-optional type. If such a property is then absent or null in runtime data, accessing it will throw a runtime exception. Use multiple samples (or SampleIsList=true) to ensure optional properties are correctly modelled.

Null values in JSON

A JSON null value that appears as the value of a typed property is treated as None. A null value in a heterogeneous context (e.g. an array of numbers and nulls) is represented via the option mechanism on the generated accessor.

CSV Provider

CSV files do not have a native null/missing concept. Instead, certain string values are treated as missing. By default, the following strings (case-insensitive) are recognised as missing: NaN, NA, N/A, #N/A, :, -, TBA, TBD (and empty string "").

You can override this list with the MissingValues static parameter.

When a column has at least one missing value, the inferred type changes as follows:

Base type

With missing values (default)

With PreferOptionals=true

int

Nullable<int> (int?)

int option

int64

Nullable<int64> (int64?)

int64 option

decimal

float (using Double.NaN)

float option

float

float (using Double.NaN)

float option

bool

bool option

bool option

DateTime

DateTime option

DateTime option

DateTimeOffset

DateTimeOffset option

DateTimeOffset option

DateOnly

Nullable<DateOnly>

DateOnly option

Guid

Guid option

Guid option

string

string (empty string "" for missing)

string option

The key differences between the default and PreferOptionals=true: - In the default mode, integers use Nullable<T> and decimals are widened to float with Double.NaN. - With PreferOptionals=true, all types use T option and you never get Double.NaN or Nullable<T>. - Strings are never made into string option by default (empty string represents missing); use PreferOptionals=true to get string option.

Design-time safety: If your sample file contains no missing values in a column, but you know that production data may have missing values, set AssumeMissingValues=true to force the provider to treat all columns as nullable/optional.

// With AssumeMissingValues=true, all columns become nullable/optional
// even if the sample has no missing values
type SafeCsv = CsvProvider<"A,B\n1,2\n3,4", AssumeMissingValues=true>

// With PreferOptionals=true, all columns use 'option' instead of Nullable or NaN
type OptionalsCsv = CsvProvider<"A,B\n1,2\n3,4", PreferOptionals=true>
type SafeCsv = CsvProvider<...>
type OptionalsCsv = CsvProvider<...>

XML Provider

In XML, values can be missing at the attribute or element level:

// 'born' attribute missing from one author → option<int>
type Authors =
    XmlProvider<"""
  <authors>
    <author name="Karl Popper" born="1902" />
    <author name="Thomas Kuhn" />
  </authors>
  """>

let sample = Authors.GetSample()

for author in sample.Authors do
    printf "%s" author.Name

    match author.Born with
    | Some year -> printfn " (born %d)" year
    | None -> printfn ""

Note: If an attribute or element is absent from all sample data but present at runtime, it cannot be accessed through the generated type. You must include at least one occurrence (possibly with a dummy value) in the sample to have the provider generate an optional property.

Heterogeneous Types

Sometimes a property can hold values of different types. The JSON type provider handles this by generating a type with multiple optional accessors — one per observed type.

// Value can be int or string → generates .Number and .String accessors
type HetValues = JsonProvider<""" [{"value":94}, {"value":"hello"}] """>

for item in HetValues.GetSamples() do
    match item.Value.Number, item.Value.String with
    | Some n, _ -> printfn "Number: %d" n
    | _, Some s -> printfn "String: %s" s
    | _ -> ()
Number: 94
String: hello
type HetValues = JsonProvider<...>
val it: unit = ()

Design-Time vs Runtime Behaviour

The type providers perform inference at compile time using the sample document. At runtime, the actual data is parsed against the inferred schema. This has a few important implications:

  1. Properties that are required at design-time may be missing at runtime. If a property is always present and non-null in your sample, the provider generates a non-optional accessor. If runtime data omits that property, a KeyNotFoundException is thrown when you access it.

  2. New properties in runtime data are ignored. If runtime JSON has extra keys that are not in the sample, those keys are simply not accessible via the generated type.

  3. The sample should cover the full range of variability. Include examples of all optional properties and heterogeneous value types in your sample. Use SampleIsList=true for JSON/XML when the root is an array of samples.

  4. Runtime errors are lazy. The providers do not validate the entire document on load. A missing or mistyped field only causes an error when that specific property is accessed.

Summary of Inference-Control Parameters

The following static parameters let you override the default inference behaviour:

Parameter

Providers

Effect

PreferOptionals

CSV, JSON, XML

Use T option for all missing/null values instead of Nullable<T> or Double.NaN

AssumeMissingValues

CSV

Treat every column as nullable/optional even if the sample has no missing values

MissingValues

CSV

Comma-separated list of strings to recognise as missing (replaces defaults)

InferRows

CSV

Number of rows to use for type inference (default 1000; 0 = all rows)

SampleIsList

JSON, XML

Treat the top-level array as a list of sample objects, not a single sample

PreferDateOnly

CSV, JSON, XML

Infer date-only strings as DateOnly on .NET 6+ (default false)

PreferDateTimeOffset

CSV, JSON, XML

Infer all date-time values as DateTimeOffset instead of DateTime (default false)

InferenceMode

JSON, XML

Enable inline schema annotations (ValuesAndInlineSchemasHints or ValuesAndInlineSchemasOverrides)

Schema

CSV

Override column names and/or types directly

For full details on each parameter, see the individual provider documentation: CSV · JSON · XML · HTML

Multiple items
namespace FSharp

--------------------
namespace Microsoft.FSharp
Multiple items
namespace FSharp.Data

--------------------
namespace Microsoft.FSharp.Data
type IntsOnly = JsonProvider<...>
type JsonProvider
<summary>Typed representation of a JSON document.</summary> <param name='Sample'>Location of a JSON sample file or a string containing a sample JSON document.</param> <param name='SampleIsList'>If true, sample should be a list of individual samples for the inference.</param> <param name='RootName'>The name to be used to the root type. Defaults to `Root`.</param> <param name='Culture'>The culture used for parsing numbers and dates. Defaults to the invariant culture.</param> <param name='Encoding'>The encoding used to read the sample. You can specify either the character set name or the codepage number. Defaults to UTF8 for files, and to ISO-8859-1 the for HTTP requests, unless `charset` is specified in the `Content-Type` response header.</param> <param name='ResolutionFolder'>A directory that is used when resolving relative file references (at design time and in hosted execution).</param> <param name='EmbeddedResource'>When specified, the type provider first attempts to load the sample from the specified resource (e.g. 'MyCompany.MyAssembly, resource_name.json'). This is useful when exposing types generated by the type provider.</param> <param name='InferTypesFromValues'> This parameter is deprecated. Please use InferenceMode instead. If true, turns on additional type inference from values. (e.g. type inference infers string values such as "123" as ints and values constrained to 0 and 1 as booleans.)</param> <param name='PreferDictionaries'>If true, json records are interpreted as dictionaries when the names of all the fields are inferred (by type inference rules) into the same non-string primitive type.</param> <param name='InferenceMode'>Possible values: | NoInference -> Inference is disabled. All values are inferred as the most basic type permitted for the value (i.e. string or number or bool). | ValuesOnly -> Types of values are inferred from the Sample. Inline schema support is disabled. This is the default. | ValuesAndInlineSchemasHints -> Types of values are inferred from both values and inline schemas. Inline schemas are special string values that can define a type and/or unit of measure. Supported syntax: typeof&lt;type&gt; or typeof{type} or typeof&lt;type&lt;measure&gt;&gt; or typeof{type{measure}}. Valid measures are the default SI units, and valid types are <c>int</c>, <c>int64</c>, <c>bool</c>, <c>float</c>, <c>decimal</c>, <c>date</c>, <c>datetimeoffset</c>, <c>timespan</c>, <c>guid</c> and <c>string</c>. | ValuesAndInlineSchemasOverrides -> Same as ValuesAndInlineSchemasHints, but value inferred types are ignored when an inline schema is present. </param> <param name='Schema'>Location of a JSON Schema file or a string containing a JSON Schema document. When specified, Sample and SampleIsList must not be used.</param> <param name='PreferDateOnly'>When true on .NET 6+, date-only strings (e.g. "2023-01-15") are inferred as DateOnly and time-only strings as TimeOnly. Defaults to false for backward compatibility.</param> <param name='UseOriginalNames'>When true, JSON property names are used as-is for generated property names instead of being normalized to PascalCase. Defaults to false.</param> <param name='OmitNullFields'>When true, optional fields with value None are omitted from the generated JSON rather than serialized as null. Defaults to false.</param> <param name='PreferOptionals'>When set to true (default), inference will use the option type for missing or null values. When false, inference will prefer to use empty string or double.NaN for missing values where possible, matching the default CsvProvider behavior.</param> <param name='PreferDateTimeOffset'>When true, date-time strings without an explicit timezone offset are inferred as DateTimeOffset (using the local offset) instead of DateTime. Defaults to false.</param>
type WithDecimal = JsonProvider<...>
type People = JsonProvider<...>
val person: JsonProvider<...>.Root
JsonProvider<...>.GetSamples() : JsonProvider<...>.Root array
val printf: format: Printf.TextWriterFormat<'T> -> 'T
property JsonProvider<...>.Root.Name: string with get
property JsonProvider<...>.Root.Age: Option<int> with get
union case Option.Some: Value: 'T -> Option<'T>
val age: int
val printfn: format: Printf.TextWriterFormat<'T> -> 'T
union case Option.None: Option<'T>
type SafeCsv = CsvProvider<...>
type CsvProvider
<summary>Typed representation of a CSV file.</summary> <param name='Sample'>Location of a CSV sample file or a string containing a sample CSV document.</param> <param name='Separators'>Column delimiter(s). Defaults to <c>,</c>.</param> <param name='InferRows'>Number of rows to use for inference. Defaults to <c>1000</c>. If this is zero, all rows are used.</param> <param name='Schema'>Optional column types, in a comma separated list. Valid types are <c>int</c>, <c>int64</c>, <c>bool</c>, <c>float</c>, <c>decimal</c>, <c>date</c>, <c>datetimeoffset</c>, <c>timespan</c>, <c>guid</c>, <c>string</c>, <c>int?</c>, <c>int64?</c>, <c>bool?</c>, <c>float?</c>, <c>decimal?</c>, <c>date?</c>, <c>datetimeoffset?</c>, <c>timespan?</c>, <c>guid?</c>, <c>int option</c>, <c>int64 option</c>, <c>bool option</c>, <c>float option</c>, <c>decimal option</c>, <c>date option</c>, <c>datetimeoffset option</c>, <c>timespan option</c>, <c>guid option</c> and <c>string option</c>. You can also specify a unit and the name of the column like this: <c>Name (type&lt;unit&gt;)</c>, or you can override only the name. If you don't want to specify all the columns, you can reference the columns by name like this: <c>ColumnName=type</c>.</param> <param name='HasHeaders'>Whether the sample contains the names of the columns as its first line.</param> <param name='IgnoreErrors'>Whether to ignore rows that have the wrong number of columns or which can't be parsed using the inferred or specified schema. Otherwise an exception is thrown when these rows are encountered.</param> <param name='SkipRows'>Skips the first n rows of the CSV file.</param> <param name='AssumeMissingValues'>When set to true, the type provider will assume all columns can have missing values, even if in the provided sample all values are present. Defaults to false.</param> <param name='PreferOptionals'>When set to true, inference will prefer to use the option type instead of nullable types, <c>double.NaN</c> or <c>""</c> for missing values. Defaults to false.</param> <param name='Quote'>The quotation mark (for surrounding values containing the delimiter). Defaults to <c>"</c>.</param> <param name='MissingValues'>The set of strings recognized as missing values specified as a comma-separated string (e.g., "NA,N/A"). Defaults to <c>NaN,NA,N/A,#N/A,:,-,TBA,TBD</c>.</param> <param name='CacheRows'>Whether the rows should be caches so they can be iterated multiple times. Defaults to true. Disable for large datasets.</param> <param name='Culture'>The culture used for parsing numbers and dates. Defaults to the invariant culture.</param> <param name='Encoding'>The encoding used to read the sample. You can specify either the character set name or the codepage number. Defaults to UTF8 for files, and to ISO-8859-1 the for HTTP requests, unless <c>charset</c> is specified in the <c>Content-Type</c> response header.</param> <param name='ResolutionFolder'>A directory that is used when resolving relative file references (at design time and in hosted execution).</param> <param name='EmbeddedResource'>When specified, the type provider first attempts to load the sample from the specified resource (e.g. 'MyCompany.MyAssembly, resource_name.csv'). This is useful when exposing types generated by the type provider.</param> <param name='PreferDateOnly'>When true on .NET 6+, date-only strings are inferred as DateOnly and time-only strings as TimeOnly. Defaults to false for backward compatibility.</param> <param name='StrictBooleans'>When true, only <c>true</c> and <c>false</c> (case-insensitive) are inferred as boolean. Values such as <c>0</c>, <c>1</c>, <c>yes</c>, and <c>no</c> are treated as integers or strings respectively. Defaults to false.</param> <param name='UseOriginalNames'>When true, CSV column header names are used as-is for generated property names instead of being normalized (e.g. capitalizing the first letter). Defaults to false.</param> <param name='PreferDateTimeOffset'>When true, date-time strings without an explicit timezone offset are inferred as DateTimeOffset (using the local offset) instead of DateTime. Defaults to false.</param>
type OptionalsCsv = CsvProvider<...>
type XmlProvider
<summary>Typed representation of a XML file.</summary> <param name='Sample'>Location of a XML sample file or a string containing a sample XML document.</param> <param name='SampleIsList'>If true, the children of the root in the sample document represent individual samples for the inference.</param> <param name='Global'>If true, the inference unifies all XML elements with the same name.</param> <param name='Culture'>The culture used for parsing numbers and dates. Defaults to the invariant culture.</param> <param name='Encoding'>The encoding used to read the sample. You can specify either the character set name or the codepage number. Defaults to UTF8 for files, and to ISO-8859-1 the for HTTP requests, unless <c>charset</c> is specified in the <c>Content-Type</c> response header.</param> <param name='ResolutionFolder'>A directory that is used when resolving relative file references (at design time and in hosted execution).</param> <param name='EmbeddedResource'>When specified, the type provider first attempts to load the sample from the specified resource (e.g. 'MyCompany.MyAssembly, resource_name.xml'). This is useful when exposing types generated by the type provider.</param> <param name='InferTypesFromValues'> This parameter is deprecated. Please use InferenceMode instead. If true, turns on additional type inference from values. (e.g. type inference infers string values such as "123" as ints and values constrained to 0 and 1 as booleans. The XmlProvider also infers string values as JSON.)</param> <param name='Schema'>Location of a schema file or a string containing xsd.</param> <param name='InferenceMode'>Possible values: | NoInference -> Inference is disabled. All values are inferred as the most basic type permitted for the value (usually string). | ValuesOnly -> Types of values are inferred from the Sample. Inline schema support is disabled. This is the default. | ValuesAndInlineSchemasHints -> Types of values are inferred from both values and inline schemas. Inline schemas are special string values that can define a type and/or unit of measure. Supported syntax: typeof&lt;type&gt; or typeof{type} or typeof&lt;type&lt;measure&gt;&gt; or typeof{type{measure}}. Valid measures are the default SI units, and valid types are <c>int</c>, <c>int64</c>, <c>bool</c>, <c>float</c>, <c>decimal</c>, <c>date</c>, <c>datetimeoffset</c>, <c>timespan</c>, <c>guid</c> and <c>string</c>. | ValuesAndInlineSchemasOverrides -> Same as ValuesAndInlineSchemasHints, but value inferred types are ignored when an inline schema is present. Note inline schemas are not used from Xsd documents. </param> <param name='PreferDateOnly'>When true on .NET 6+, date-only strings are inferred as DateOnly and time-only strings as TimeOnly. Defaults to false for backward compatibility.</param> <param name='DtdProcessing'>Controls how DTD declarations in the XML are handled. Accepted values: "Ignore" (default, silently skips DTD processing, safe for most cases), "Prohibit" (throws on any DTD declaration), "Parse" (enables full DTD processing including entity expansion, use with caution).</param> <param name='UseOriginalNames'>When true, XML element and attribute names are used as-is for generated property names instead of being normalized to PascalCase. Defaults to false.</param> <param name='PreferOptionals'>When set to true (default), inference will use the option type for missing or absent values. When false, inference will prefer to use empty string or double.NaN for missing values where possible, matching the default CsvProvider behavior.</param> <param name='PreferDateTimeOffset'>When true, date-time strings without an explicit timezone offset are inferred as DateTimeOffset (using the local offset) instead of DateTime. Defaults to false.</param>
val sample: obj
val author: obj
val year: int
type HetValues = JsonProvider<...>
val item: JsonProvider<...>.Root
property JsonProvider<...>.Root.Value: JsonProvider<...>.IntOrString with get
property JsonProvider<...>.IntOrString.Number: Option<int> with get
property JsonProvider<...>.IntOrString.String: Option<string> with get
val n: int
val s: string

Type something to start searching.