Sample app: Import data from spreadsheet columns

Our latest sample app converts spreadsheet column data to formulas for use with drop-down fields. This app uses regular expressions heavily and this post explains in detail how it was built.

November 1, 2021 update: Calcapp now supports the XLOOKUP function (blog post), along with an app that generates XLOOKUP formulas from spreadsheet tables. Generating CHOOSE formulas is sometimes still the better choice, though, but XLOOKUP provides more flexibility. Refer to the links for details.

Original text: We publish sample apps as a means of demonstrating Calcapp features and techniques that you may want to use in your own apps that you distribute to your own users. This sample app is different — the intended user is you, the app creator, and it helps you convert spreadsheet data for use with drop-down fields.

Read this blog post to get a high-level overview of the Calcapp features that work together to enable spreadsheet data to be imported. There’s also a video tutorial explaining the entire workflow. This post explains how to use the app converting spreadsheet data and how it was written.

To summarize, Calcapp has gained support for the new formula function CHOOSE, which can be used together with the new Index property of drop-down fields. These formulas tend to be much shorter and more maintainable than the formulas you’d write previously using the IF formula function. What’s more, this app generates them automatically from data you paste from spreadsheets.

Here’s an embedded version of the table converter app, which you can also run as a stand-alone app:

Paste column data from a spreadsheet into the Column data field, with one value per line. If you like, you can select a sample data set from the Sample data drop-down field to see how the sample data is converted into formulas. Select Don’t use sample data from the drop-down field when you’re ready to use real data.

The Field name field lets you enter the name of your drop-down field. The generated CHOOSE formulas access its Index property, so if you don’t enter it you’ll need to adjust the generated formulas manually.

The Commas field lets you specify how commas in the pasted data are handled. If the data you’re pasting uses commas as thousands separators (which is standard in the English-speaking world), keep the default setting of Thousands separators. If your data uses a comma as a decimal separator, select Decimal separators.

When you paste data, you’ll find that two CHOOSE formulas are generated. The one in the field labeled CHOOSE formula for number fields is intended for numeric data. Lines that don’t appear to contain numeric data are rendered as BLANK() in the generated formula, meaning that a blank value is selected. The formula in the field labeled CHOOSE formula for text fields encloses all the lines in the pasted data in quotation marks and is intended for text fields.

Under the hood

When we added support for the CHOOSE function and the Index property of drop-down fields, we also wanted to write a converter automatically generating the CHOOSE formulas. The idea of writing it using traditional programming seemed like a big project, though, and we tried creating the converter using Calcapp.

While early results were promising, it soon became apparent that Calcapp wasn’t powerful enough to handle all aspects of the conversion. We either needed to abandon Calcapp and use traditional programming or add the missing features to Calcapp. We chose the latter approach.

The fruits of that work are the three new formula functions for processing text: REGEXMATCH, REGEXEXTRACT and REGEXREPLACE. These three new functions use so-called regular expressions to determine if text matches a specific regular expression, to extract text matching a regular expression or to replace text matching a regular expression.

This app makes heavy use of the REGEXREPLACE function to transform the pasted data into CHOOSE formulas. If you’re not familiar with regular expressions, please read our introduction to them before you continue reading.

You’re encouraged to view the app and its formulas as you read along. It’s available as a template when you create a new app under the Sample: Column converter name.

Note: The rest of this post is quite long, but it does explain in detail how the column converter app is implemented. Feel free to digest it in several sittings, or skip it altogether if you’re just interested in using the app and not in how it was written.

Line breaks

When you enter a line of text and press Enter to start a new line, your computer inserts a so-called line break character, also called a newline, end of line or line feed character. This character is invisible and only serves to instruct your computer to display the characters following it on a new line.

As line breaks are used pervasively by this app, we define a new hidden text field which uses a formula to produce a line break named LineBreak. Here’s the formula:

CHAR(10)

The CHAR function returns a single character, identified by a so-called code point. Every character in existence has a numeric code point and this function can be used to return some of the more esoteric ones. A line break character is invisible, so it must be generated by a function. Its numeric code happens to be 10.

(If you’re used to spreadsheets, all this probably sounds familiar, as they handle this the same way.)

Validating field names

The Settings group has a visible field named EnteredFieldName, with the initial (default) value “Field1”. There is a formula associated with the Valid property of the field, which determines whether the entered value is considered valid:

(LEN(EnteredFieldName) > 0) && REGEXMATCH(EnteredFieldName, "^[\p{L}_][\p{L}0-9_]*$", TRUE)

This formula states that the value should only be considered valid if its length is greater than zero and the REGEXMATCH function returns TRUE. (If you’re not familiar with the && operator, it’s specific to Calcapp. Field1 && Field2 is fully equivalent to AND(Field1, Field2))

REGEXMATCH returns TRUE if the text value of the first parameter matches the regular expression given as the second parameter. The third parameter is set to TRUE to indicate that case should be ignored.

What the ^[\p{L}_][\p{L}0-9_]*$ regular expression does is that it matches text strings which are valid Calcapp names. A valid name starts with a letter and is followed by an arbitrary number of letters, numbers or underscores.

The regular expression starts with the ^ character and ends with the $ character to ensure that the value fully matches the regular expression. Otherwise, the function would happily return TRUE if the value contains a valid Calcapp name, as opposed to being entirely made up of one. Without the ^ and $ characters, REGEXMATCH would return TRUE for the value “!#€%Field1#€#”, which contains a valid Calcapp name but shouldn’t be considered valid when considered in its entirety.

Also, note that \p{L} is used instead of [a-z] to match letters. [a-z] only matches the letters of the English alphabet, whereas \p{L} matches any letter written in any language, including Japanese (“日本語”) and German (“ö”).

The field name used from other formulas isn’t EnteredFieldName, it’s a hidden text field named FieldName. Here’s its formula:

IF(EnteredFieldName.Valid, EnteredFieldName, "Field1")

In other words, the field name used by other formulas is the entered field name only if it is valid. Otherwise, the generic field name Field1 is used.

Handling the decimal separator setting

To determine if commas are considered thousands separators or decimal separators, a simple text drop-down field name CommaStatus is used. Other formulas use the hidden logical field CommasAreDecimalSeparators to determine if a comma is a decimal separator (FALSE implies that commas are thousands separators). Doing so ensures that other formulas don’t need to concern themselves with the exact wording used in the drop-down field. Here’s the simple formula:

CommaStatus = "Decimal separators"

Handling the sample data setting

There’s another text drop-down field, named SampleData, where users can specify whether they want to use sample data as opposed to typing it in themselves. There are four values in the drop-down menu, the first of which reads Don’t use sample data.

There are a number of hidden fields which ensure that other formulas don’t need to concern themselves with the values of the drop-down field. UseSampleData is a hidden logical field which is used to determine if sample data should be used. Here’s its formula:

SampleData <> "Don't use sample data"

The <> operator means “is not equal to.” Calcapp also allows you to write this operator as !=, which should be more familiar if you’re more used to programming than to spreadsheets.

The actual sample data is contained in three hidden text fields: SampleNumbers, SampleTextStrings and SampleCurrency. Here’s an excerpt from the formula for the SampleCurrency field:

"£6.25" & LineBreak & "£5.20" & LineBreak & "£5.20" [...]

The LineBreak field is referenced from the formula to insert line breaks. The & operator is used to join text strings together.

There’s another hidden text field named SampleText, which holds the sample text selected by the user or is a blank value if sample text should not be used. Here’s its formula:

CHOOSE(SampleData.Index - 1, SampleNumbers, SampleTextStrings, SampleCurrencyStrings)

The formula makes good use of the CHOOSE function. It subtracts one from the index of the selected value, because the first value is Don’t use sample data and as a result, the value for selecting sample numbers is the second value and not the first value. The CHOOSE function expects its first parameter to be 1 if sample numbers should be selected, though, necessitating the subtraction.

Entering column data

Column data is entered in a field named ColumnData. Its Visible property makes it hidden if sample data is used, though:

!UseSampleData

(!UseSampleData is equivalent to NOT(UseSampleData). ! is yet another Calcapp-specific operator.)

There is another field for column data, SampleColumnData, one that is only shown if sample data is selected. As you might expect, its Visible property is associated with the following formula:

UseSampleData

Its value is associated with this formula:

SampleText

In other words, the sample data is shown if the user indicates that he or she would like to use sample data. Otherwise, the user is given the opportunity to enter his or her own data.

An additional hidden text field, named ActualColumnData, is used to ensure that the other formulas don’t need to concern themselves with whether sample data is used. Here’s its formula:

IF(UseSampleData, SampleColumnData, ColumnData)

It selects the sample data if the user has indicated that preference or the column data manually entered by the user otherwise.

Generating CHOOSE formulas for number fields

Converting column data to CHOOSE formulas for number fields is done in two steps. First, each line is transformed to a more suitable format through the ColumnDataAsNumbers field. You’re encouraged to make it visible using Calcapp Creator when you’re experimenting with the app to more easily follow along.

Let’s say that the user enters the following data:

£5.73
£10.41

£11.02
£11.02
test
£13.97
£13..97
£14.08

The formula of the ColumnDataAsNumbers field then transforms this data into the following:

5.73
10.41

11.02
11.02

13.97

14.08

As you can see, the currency symbols are gone, as is the line only containing the text string “test” and the malformed number “13..97”. In other words, the ColumnDataAsNumbers field turns the lines of text into pure numbers. It could not transform the “test” and “£13..97” lines, so they were removed.

Processing commas

Here’s the formula associated with the ColumnDataAsNumbers field:

REGEXREPLACE(REGEXREPLACE(SUBSTITUTE(ActualColumnData, ",", IF(CommasAreDecimalSeparators, ".", "")), "[^-.\d" & LineBreak & "]", "", TRUE), "^(?!-?\d+(?:\.\d+)?$).*$", "", TRUE, FALSE, TRUE)

There’s a lot to cover here, so let’s unpack it one function at a time. The passage marked in yellow above uses the SUBSTITUTE function to either replace all commas with a period or remove them altogether. If commas should be interpreted as decimal separators, they are replaced with periods. (Numbers in Calcapp formulas are always written with a period for a decimal separator.) If commas should not be interpreted as decimal separators, they are thousands separators and are removed.

Removing characters which can’t be part of numbers

Next, let’s turn our attention to the passages marked in yellow below:

REGEXREPLACE(REGEXREPLACE(SUBSTITUTE(ActualColumnData, ",", IF(CommasAreDecimalSeparators, ".", "")), "[^-.\d" & LineBreak & "]", "", TRUE), "^(?!-?\d+(?:\.\d+)?$).*$", "", TRUE, FALSE, TRUE)

The intent of this passage is to remove all characters which can’t be part of numbers. All characters other than minus symbols, numbers, periods and line breaks are removed. The intent is to clean up lines containing a mix of numbers and text, like “£13.97”, so that they can be interpreted as pure numbers.

Here’s how it works: the REGEXREPLACE function is passed the text string returned by the SUBSTITUTE function and is invoked with the regular expression resulting from this formula fragment: "[^-.\d" & LineBreak & "]". As you can see, the & operator is used to insert a line break character into the regular expression. [^...] is a negated range, which matches any character which is not part of that range, including letters and currency symbols. In this case, the regular expression matches all characters which are not minus symbols, periods, numbers (\d) or line breaks. The REGEXREPLACE function then replaces all matching characters with the empty text string, "", given as the third parameter. The final fourth parameter, which is set to TRUE, ensures that all matches are replaced and not just the first one.

In effect, the REGEXREPLACE function removes all characters which can’t be part of numbers. The entire “test” text string is removed, as are the currency symbols.

Removing invalid numbers

The malformed number “13..97” still remains, because it only contains characters which are permissible in numbers. These malformed numbers are removed by a final step in the formula, marked in yellow below:

REGEXREPLACE(REGEXREPLACE(SUBSTITUTE(ActualColumnData, ",", IF(CommasAreDecimalSeparators, ".", "")), "[^-.\d" & LineBreak & "]", "", TRUE), "^(?!-?\d+(?:\.\d+)?$).*$", "", TRUE, FALSE, TRUE)

This final regular expression uses a technique not covered in our introduction to regular expressions: negative lookahead, which looks like this: (?!...).

This regular expression uses the same idea as the other REGEXREPLACE regular expression, of matching lines not considered acceptable and replacing them with the empty text string, thereby removing them. A negative lookahead does not match text, it is an assertion that the expression within parentheses does not match the following text. Here’s the part of the regular expression that is part of the negative lookahead:

-?\d+(?:\.\d+)?$

It you look closely, you’ll see that this pattern identifies a number. It may start with a minus symbol (but does not necessarily do so, due to the trailing question mark). It is then followed by one or several numbers. Then, there’s this part:

(?:\.\d+)?

This is the fractional part of the number. Numbers don’t necessarily have a fractional part, so the entire expression in parentheses is followed by a question mark. You may recall that the presence of ?: following the opening parenthesis means that the parenthesis is non-capturing. Inside the parenthesis is \.\d+, meaning a literal period followed by one or several numbers.

The last symbol of the regular expression -?\d+(?:\.\d+)?$ is $, meaning that there should be nothing following a proper number before the end of the line. (“23…” isn’t a valid number, for instance.)

Again, here’s the complete regular expression:

^(?!-?\d+(?:\.\d+)?$).*$

Right before the negative look-ahead is the ^ character, ensuring that nothing that is not a number precedes the number. (“…23” isn’t a valid number.) If the negative look-ahead goes through, it means that what follows is not a number. The regular expression then uses .* to select everything between the beginning of the line to the end of the line ($). REGEXREPLACE then replaces the entire line with the empty text string, effectively removing it.

And there you have it, three functions working in concert to remove invalid entries from the pasted column data.

The final step: generating the CHOOSE formula

Recall that the entered data has now been turned into a list of pure numbers and empty lines. The final order of business is to turn this list into a CHOOSE formula. In other words, this list:

5.73
10.41

11.02
11.02

13.97

14.08

…should be turned into this Calcapp formula:

CHOOSE(Field1.Index, 5.73, 10.41, BLANK(), 11.02, 11.02, BLANK(), 13.97, BLANK(), 14.08)

Gaps in the list are converted to BLANK(), which returns a blank value. It’s important that gaps in the list are not removed, so that the values of the drop-down field are correctly matched with the numbers of the formula. (13.07, for instance, needs to remain the seventh value passed to the CHOOSE function, as it should match the seventh value selected from the drop-down field.)

Here is the final formula, converting the list to a CHOICE formula:

IF(ISDEFINED(ActualColumnData), "CHOOSE(" & FieldName & ".Index" & REGEXREPLACE(", " & SUBSTITUTE(ColumnDataAsNumbers, LineBreak, ", "), ", (?!\d)", ", BLANK()", TRUE) & ")")

Again, let’s go over it one function at a time. The passage marked in yellow above is the IF function, which ensures that Calcapp only attempts to display a CHOICE formula if the ActualColumnData field has a defined value (through the ISDEFINED function). Otherwise, the IF function returns a blank value, ensuring that nothing is displayed, which is preferable to displaying a mangled CHOICE formula.

IF(ISDEFINED(ActualColumnData), "CHOOSE(" & FieldName & ".Index" & REGEXREPLACE(", " & SUBSTITUTE(ColumnDataAsNumbers, LineBreak, ", "), ", (?!\d)", ", BLANK()", TRUE) & ")")

The call to the SUBSTITUTE function above takes the list produced by the ColumnDataAsNumbers field and replaces all line breaks with commas followed by a single space. Now, the list has been turned into a single text string, with commas instead of line breaks.

Remember the empty lines of the list? Two consecutive blank lines are converted to “, , “ by the SUBSTITUTE formula, which isn’t what we’re after. We’ll use the REGEXREPLACE function to fix this:

IF(ISDEFINED(ActualColumnData), "CHOOSE(" & FieldName & ".Index" & REGEXREPLACE(", " & SUBSTITUTE(ColumnDataAsNumbers, LineBreak, ", "), ", (?!\d)", ", BLANK()", TRUE) & ")")

The passage above uses a regular expression with negative lookahead and the REGEXREPLACE function to match any text consisting of “, “ not immediately followed by a number with the text “, BLANK()”, thereby fixing our problem. “, , “ is now transformed into “BLANK(), BLANK(), “.

IF(ISDEFINED(ActualColumnData), "CHOOSE(" & FieldName & ".Index" & REGEXREPLACE(", " & SUBSTITUTE(ColumnDataAsNumbers, LineBreak, ", "), ", (?!\d)", ", BLANK()", TRUE) & ")")

Finally, the formula uses the & operator to join the produced text string together with the text “CHOOSE(“, followed by the name of the field and a closing parenthesis.

Generating CHOOSE formulas for text fields

Luckily, generating CHOOSE formulas for text fields is far simpler than doing the same for number fields. As with number fields, converting column data to CHOOSE formulas for text fields is done in two steps. First, each line is transformed to a more suitable format through the ColumnDataAsText field. Again, you’re encouraged to make it visible using Calcapp Creator when you’re experimenting with the app to more easily follow along.

Let’s say that the user enters the following data:

Indigo
Ivory
Khaki
Lavender
Silver
"Snow"
Tan

The formula of the ColumnDataAsText field then transforms this data into the following:

"Indigo"
"Ivory"
"Khaki"
"Lavender"
"Silver"
"""Snow"""
"Tan"

Quotation marks are added because literal text strings in Calcapp formulas must be enclosed by them. To represent literal quotation marks in formulas, write two consecutive quotation marks (which are interpreted as one quotation mark instead of ending the quote). That’s why “Snow” is written as “"”Snow”””.

Processing the column data

Here’s the formula associated with the ColumnDataAsText field:

REGEXREPLACE(REGEXREPLACE(SUBSTITUTE(ActualColumnData, """", """"""), "^|$", """", TRUE, FALSE, TRUE), "^""$", """""", TRUE, FALSE, TRUE)

The passage in yellow above ensures that a single, literal quotation mark is converted to two quotation marks. There are many quotation marks in the formula above, because of Calcapp’s requirement that literal quotation marks are written as two consecutive quotation marks.

REGEXREPLACE(REGEXREPLACE(SUBSTITUTE(ActualColumnData, """", """"""), "^|$", """", TRUE, FALSE, TRUE), "^""$", """""", TRUE, FALSE, TRUE)

Next up is the REGEXREPLACE function, which is given the result of the SUBSTITUTE function as input and adds a quotation mark to the begininng of all lines and another quotation mark to line endings. It does so using a regular expression matching the beginning or end of every line: ^|$. (The | character should be read as “or” and is known as alternation.) The first TRUE parameter asks that all matching lines are considered instead of just the first one, the next FALSE parameter indicates that case-insensitive matching should not be used and the final TRUE parameter asks that the ^ and $ characters should be line-relative as opposed to relative to the entire text string.

There is a problem with this use of REGEXREPLACE function, though. Truly empty lines only consist of a line break character and the ^ and $ characters as a result match at the same position. The result is that only one quotation mark is added.

REGEXREPLACE(REGEXREPLACE(SUBSTITUTE(ActualColumnData, """", """"""), "^|$", """", TRUE, FALSE, TRUE), "^""$", """""", TRUE, FALSE, TRUE)

This final REGEXREPLACE function fixes the problem cited above and replaces lines consisting solely of one quotation mark with lines consisting of two consecutive quotation marks.

The final step: generating the CHOOSE formula

The entered data has now been turned into a list of properly quoted strings. The final order of business is to turn this list into a CHOOSE formula. In other words, this list:

"Indigo"
"Ivory"
"Khaki"
"Lavender"
"Silver"
"""Snow"""
"Tan"

…should be turned into this Calcapp formula:

CHOOSE(Field1.Index, "Indigo", "Ivory", "Khaki", "Lavender", "Silver", """Snow""", "Tan")

Here is the final formula, converting the list to a CHOICE formula:

IF(ISDEFINED(ActualColumnData), "CHOOSE(" & FieldName & ".Index," & SUBSTITUTE(ColumnDataAsText, LineBreak, ", ") & ")")

This formula uses the same principles as the formula converting number lists into CHOOSE formulas. The IF function ensures that a value is only produced if the ColumnDataAsText field contains a defined value through the ISDEFINED function. The call to the SUBSTITUTE function takes the list and replaces all line breaks with commas followed by a single space. Now, the list has been turned into a single text string, with commas instead of line breaks.

Finally, the formula uses the & operator to join the produced text string together with the text “CHOOSE(“, followed by the name of the field and a closing parenthesis.

« Feature: Customize the loading screen Feature: Edit drop-down values as text »