We publish sample apps as a means of demonstrating Calcapp features and techniques that you may want to use in your own apps that you distribute to your own users. This sample app is different — the intended user is you, the app author, and it helps you convert spreadsheet data to formulas that use the new XLOOKUP function.

Read this blog post to get a high-level overview of how to use this app to import table data and to learn how XLOOKUP works. This post explains in detail how to use the app converting spreadsheet data and how it was written.

To summarize, Calcapp now supports the XLOOKUP function, which is
equivalent to the classic spreadsheet functions VLOOKUP and HLOOKUP.
XLOOKUP accepts the sought value and two arrays, one containing
values to find (the *lookup array*) and one containing the
values to return (the *result array*).

This formula returns 300, because the sought value, 30, is the third element in the first array, which causes XLOOKUP to return the third element in the second array, 300:

As Calcapp does not yet have native support for data tables, the data must be converted to XLOOKUP formulas. That’s the job of this app.

## Usage

Here’s an embedded version of the app, which you can also run as a stand-alone app:

For step-by-step instructions on how to use this app, refer to this blog post.

Copy the lookup values from a spreadsheet table and paste them into
the **Lookup values** field. Then, copy the result
values and paste them into the **Result values** field.
The generated XLOOKUP formula then becomes available.

Be sure to select the decimal separator you’re using using the
**Decimal separator** field, to ensure that the
generated formula works well with Calcapp Creator. If you use a
decimal comma as a decimal separator, the app will use semicolons to
separate parameters. If you use a decimal point as a decimal
separator, the app will use commas to separate parameters.

Also, the app automatically detects if the values are numbers or text. If they are numbers (potentially starting with a currency symbol, such as “$”), the elements of the generated array are not quoted, ensuring that you get numbers — and not text — back when you call the function.

## Under the hood

This app makes heavy use of Calcapp’s new formula engine, which we introduced with our November release. Thanks to the new features, this app is considerably simpler than the sample app that generates CHOOSE formulas we wrote in 2018.

You’re encouraged to view the app and its formulas as you read along.
It’s available as a template when you create a new app under the
*Sample: XLOOKUP generator* name.

**Note:** This post explains in detail how the XLOOKUP
generator app is implemented. Feel free to skip it altogether if
you’re just interested in using the app and not in how it was
written.

## Handling separators

This app works with both decimal points and decimal commas. It is important for the app to have access to this information, because the generated XLOOKUP formula uses commas as a parameter separator if a decimal point is selected and semicolons otherwise. Also, the app recognizes when users enter numbers as lookup or result values and leaves out quotation marks if that is the case.

There is a text drop-down field named *Decimal separator*. In
order to insulate the rest of the app from having to know about the
actual wording of the options available to the user, there are two
hidden text fields
that expose the chosen decimal separator and parameter separator as
text strings.

The first is named *DecimalSeparatorText* and this is its
value formula:

The second is named *ParameterSeparatorText* and this is its
value formula:

## Detecting numbers with a regular expression

It is important that the app can recognize numbers. To do so, we would normally use a function like PARSENUMBER, which returns an error if the given text string is not a number and the parameter as a number otherwise. This doesn’t work with this app, though, as it needs to be able to work with both decimal points and decimal commas.

PARSENUMBER does not work for this app, as it takes the language of the app into account and its decimal separator (a decimal point for US English and a decimal comma for German, for instance). PARSENUMBER can’t be coaxed into working with decimal commas if the language of the app uses a decimal point, and vice versa.

As a result, we need to detect if a text string is a number using a regular expression. These text strings can be notoriously tricky to get right, and we’ll be the first to admit that writing this sample app required a trip to the excellent regex101.com to make our regular expression work.

(Have you never heard of regular expressions before? They are a powerful tool to work with text. We wrote a lengthy introduction back in 2018 when Calcapp gained support for them. Be sure to read that post first if you want to understand this section. If not, feel free to skip ahead.)

To determine if a text string matches a regular expression, we need
to use the REGEXMATCH function.
`\d`

matches
a single digit, and `\d+`

matches one or
multiple digits.

As a result, both these formulas return TRUE, indicating that both
“123” and “123456789” match the regular expression `\d+`

:

The regular expression `\d+`

can be used to
match multiple integers, but what about decimal numbers? If a decimal
point is used, `\d+\.\d+`

matches a
decimal number. (The backslash before the period is there because
periods are otherwise special characters in regular expressions that
match anything. With a backslash, the period in the formula above
matches a literal period.)

To make the regular expression work for arbitrary numbers — integers or decimal numbers — we need to make the decimal part optional. Consider this regular expression:

`\d+(\.\d+)?`

The part in parentheses matches the decimal part of the number. By putting a question mark after the closing parenthesis, the entire decimal part is made optional.

There are five more changes we need to make to make the regular expression complete, though. First, it must accept negative numbers, so we add a leading minus sign, followed by a question mark to make the minus sign optional:

`-?\d+(\.\d+)?`

Second, it must accept numbers that are preceded by a currency symbol
(such as “$”). We don’t want these lines to count as text lines.
`\p{Sc}`

matches a currency symbol in a regular expression. It needs to go
after the minus sign:

`-?\p{Sc}?\d+(\.\d+)?`

Third, we need to accept decimal numbers where the zero isn’t spelled
out (like in .1). To do so, we must change the first `+`

character, which
accepts one or multiple digits, to a `*`

character, which
accepts zero or multiple digits:

`-?\p{Sc}?\d*(\.\d+)?`

Fourth, we only want to accept text strings that only consist of
numbers, meaning that “3a” or “a3” should not qualify. To do so, we
need to add so-called *anchors*. `^`

represents the
beginning of the text string, and is added first, and `$`

represents the end of
the text string, and is added last:

`^-?\p{Sc}?\d*(\.\d+)?$`

Above, the anchors prevent arbitrary text from appearing between the beginning of the text string and the number, and between the number and the end of the text string.

Finally, we need to accept a decimal comma and not a decimal point if
that is the user’s preference. Remember that the symbol the user
prefers is stored as the value of the *DecimalSeparatorText*
hidden field. That means that we need to replace `\.`

in the expression
with `\,`

if
that is the case. (A leading backslash isn’t strictly speaking needed
here, as a comma has no special meaning in this context. It does no
harm, though, so it stays.) We use the & operator to join
together the various pieces of the expression we need.

Putting it all together, this is the formula for the value of the
hidden field *NumberRegex*, which we’ll refer back to from
other formulas later:

## Determining if a text field only consists of numeric lines of text

Users are expected to paste their lookup values into the
*LookupValues* text field and their result values into the
*ResultValues* text field. Below each of them, we add two
hidden logical fields which determine if the text fields only hold
numbers, named *LookupValuesAreNumbers* and
*ResultValuesAreNumbers*.

It is possible to use the REGEXMATCH function directly with text strings consisting of multiple lines. However, we won’t do that here, instead we’ll use the new TEXTSPLIT function to create an array of text strings (with one line per array element) from the original text string. If you’re interested in the other approach, refer to our write-up for the app producing CHOOSE formulas.

TEXTSPLIT breaks a text string apart and returns the pieces as an array of text strings. This formula returns the array { "1", "2", "3" }{ "1"; "2"; "3" }:

In the formula above, “-“ is used as the delimiter between the returned array elements. With a text field accepting multiple lines, the delimiter we are looking for is a line break character, which the new NEWLINE function returns.

As such, this formula breaks the text string *LookupValues*
apart, returning an array of the lines of text it consists of:

To determine if the individual text lines can be considered numbers, we can invoke the REGEXMATCH function directly on the array:

The second parameter is set to `NumberRegex`

, which is
the hidden text field we constructed earlier holding the regular
expression which matches numbers.

REGEXMATCH normally accepts two parameters (and a few optional parameters we won’t discuss here). The first one is a text string to match against a regular expression, given as the second parameter. It returns TRUE if the regular expression matches the given text string and FALSE otherwise.

Our new formula engine ensures that REGEXMATCH can also work with arrays. If the first parameter is an array — which it is, in this case — the result is also an array. Specificially, the result is an array of logical values, with the same size as the original text array. If { TRUE, FALSE, TRUE }{ TRUE; FALSE; TRUE } is returned, that means that the array given to REGEXMATCH contains three elements, and that the first and third elements match the regular expression given as the second parameter, while the second element does not.

So how do we go from a logical array, where each element signifies whether a single line can be interpreted as a number, to a single logical value, where TRUE means that all elements can be interpreted as numbers and FALSE indicates that there is at least one element that does not qualify?

Enter AND. This formula returns FALSE, because there is at least one FALSE element:

AND only returns TRUE if all elements given to it are TRUE. This formula returns TRUE:

Putting it all together, this is the formula associated with the
value of the *LookupValuesAreNumbers* hidden field:

The formula for the *ResultValuesAreNumbers* hidden field is
similar.

## Generating the XLOOKUP formula

The *GeneratedXLOOKUPFormula* field contains the generated
formula. The formula of the value of this field uses the & operator heavily to
join text strings together. It makes use of the hidden field
*ParameterSeparatorText* to ensure that it inserts the proper
parameter separator (a comma if a decimal point is used and a
semicolon otherwise).

There are two final hidden fields we need in order to make the
formula for the value of the *GeneratedXLOOKUPFormula* field
more readable. They are named *LookupQuoteIfNeeded* and
*ResultQuoteIfNeeded*.

The idea is that these hidden fields should contain a single quotation mark (“), but only if the corresponding text field does not exclusively contain lines that can be interpreted as numbers. These hidden text fields can then be referenced from the final formula as an easy means of inserting quotation marks, but only if they are needed.

Here’s the formula for the value property of the
*LookupQuoteIfNeeded* field:

The second parameter to IF may look a bit odd, but that’s actually how a single quotation mark is written in Calcapp (and in spreadsheets). Of the four quotation marks, the first and the last demarcate the text string. To write a single quotation mark, two quotation marks must be written, because a single one would end the text string prematurely.

Before we look at the final formula, let’s look at the expected results from the app. Let’s say that we have a very simple table, consisting of only two rows and two columns. Here it is:

Name | Age |
---|---|

Louise | 53 |

Ahmed | 28 |

The user is expected to paste the contents of the first column into
the *LookupValues* field, and the contents of the second
column into the *ResultValues* field. We then want the app to
generate this formula if the preferred decimal separator is a decimal
point:

We want the app to generate this formula if the preferred decimal separator is a decimal comma:

Before we move on to the final formula that generates the formulas
above, let’s consider what it takes to generate just the arrays to
XLOOKUP. We have already concluded that we can convert the value of
the *LookupValues* text field (complete with line breaks) to an
array of its constituent lines with this formula:

How do we put it back together to form a text string, once more, this time with different delimiters? Remember that the delimiter used above is a line break. This time around, we want the delimiter to be a parameter separator character (a comma or a semicolon), followed by a space for readability.

To achieve this, we use the TEXTJOIN function. This formula does exactly what is described in the preceding paragraph:

(The second parameter, TRUE, instructs TEXTJOIN to ignore blank values.)

Now, what if the lookup values consist of text strings (meaning that
the value of the *LookupValuesAreNumbers* hidden field is
FALSE)? Then we need to use not only `,`

or `;`

as delimiters, but
`" , "`

or
`" ; "`

(so
that the elements of the array end with and start with a quotation
mark).

This formula does just that:

To truly generate an array, though, we also need the begin and end
braces, `{`

and `}`

.
Also, we potentially need to add quotation marks to the beginning of
the first element and to the end of the last element.

Here’s the formula for a hidden text field named
*GeneratedLookupArray*:

There is a similar hidden text field named
*GeneratedResultArray*.

Here’s the complete formula, combining the values of
*GeneratedLookupArray* and *GeneratedResultArray*:

## One last thing

At first blush, the complete formula above appears to work very well. It works for plain numbers and it works for text strings.

However, what about numbers that are preceded by currency symbols?
The *NumberRegex* field treats them as numbers, which leads
this app to generate an XLOOKUP formula such as this one:

That’s not a valid Calcapp formula — numbers can’t be preceded by currency symbols.

To fix this problem, we need to take steps to remove the currency
symbols. To do so, we’ll modify the TEXTSPLIT parts of the formulas
for the values of *GeneratedLookupArray* and
*GeneratedResultArray*. Here is part of the formula for the
value of *GeneratedResultArray*:

(We’ll show this exclusively for result values, the formula for the
value of *GeneratedLookupArray* also needs to be modified.)

First off, if the result values are not numbers, the formula above is exactly what we need — no currency symbols should be removed. We’ll achieve this with IF:

We’ll need to replace `…`

in the formula above
with a formula fragment that removes currency symbols.

To achieve this, we’ll rely on another regular expression:

`([\P{Sc}]*)\p{Sc}?(.*)`

`\P{Sc}`

means “not a currency symbol” and `\p{Sc}`

means “a
currency symbol.” What this regular expression does is that it
divides a text string into two halves (captured by two *capturing
groups*), one consisting of the text that appears before a
currency symbol and one consisting of the text that appears after.
The currency symbol itself appears between the two halves and is not
part of either capturing group.

(Capturing groups appear between parentheses in regular expressions and capture text, in this case the two halves mentioned above.)

The REGEXEXTRACT function can be used to extract the text captured by a capturing group. To use it, we need to provide the number of the capturing group as the third parameter (1 or 2, in this case, as there are two capturing groups).

Using REGEXEXTRACT would work — but would not be ideal, as we’d then need to repeat the REGEXEXTRACT invocation (once with 1 as the third parameter and once with 2 as the second parameter) and merge together the results using the & operator.

The new REGEXEXTRACTALL function is different from REGEXEXTRACT in that it returns the contents of all capturing groups as an array. That’s perfect, as we can then merge together the two halves (without the currency symbol) not with &, but with TEXTJOIN, with a delimiter set to """".

Here’s the formula with the `…`

part filled in:

*)\p{Sc}?(.*)")), TEXTSPLIT(ResultValues, NEWLINE()))IF(ResultValuesAreNumbers; TEXTJOIN(""; FALSE; REGEXEXTRACTALL(TEXTSPLIT(ResultValues; NEWLINE()); "([\P{Sc}]

*)\p{Sc}?(.*)")); TEXTSPLIT(ResultValues; NEWLINE()))

Here’s the complete formula for *GeneratedResultArray*:

*)\p{Sc}?(.*)")), TEXTSPLIT(ResultValues, NEWLINE()))) & ResultQuoteIfNeeded & " }""{ " & ResultQuoteIfNeeded & TEXTJOIN(ResultQuoteIfNeeded & ParameterSeparatorText & " " & ResultQuoteIfNeeded; TRUE; IF(ResultValuesAreNumbers; TEXTJOIN(""; FALSE; REGEXEXTRACTALL(TEXTSPLIT(ResultValues; NEWLINE()); "([\P{Sc}]

*)\p{Sc}?(.*)")); TEXTSPLIT(ResultValues; NEWLINE()))) & ResultQuoteIfNeeded & " }"