Strings in Elixir

Strings in Elixir

Strings were one of the first things that I found confusing about elixir. I’ve worked professionally in a bunch of different languages: ruby, javascript, swift, python, scala, clojure, java. And for the most part strings work the same in each of them. I mistakenly assumed that strings are the same in Elixir - actually they’re not.

First, there is no String type in elixir. Stings don’t get their own type, but are represented using other builtin elixir/erlang types.

There are 2 different string representations in Elixir

  1. Binary
  2. Character lists

These two string representations in Elixir are quite different. You need to be cognisant of the string representation that you are using as this affects the operations that you can perform on the string and how you process it.

Strings as binaries

If you create a string using " the string is represented as a UTF-8 encoded binary. Most of the common operations you’ll want to do on strings are contained in the in the String module operate on the binary string representation.

This is generally the string representation you want to use.

Lets create a string and check its a binary

> s = "abc"
"abc"
> is_binary(s)
true

We can call any of the functions from the String module on this binary

 > String.capitalize(s)
"Abc"
> String.reverse(s)
"cba"
> String.split(s, "b")
["a", "c"]

We can’t use hd to get the first element of the string

> hd s
** (ArgumentError) errors were found at the given arguments:

  * 1st argument: not a nonempty list

    :erlang.hd("abc")

because this isn’t a list - its a binary.

> i s
Term
  "abc"
Data type
  BitString
Byte size
  3
Description
  This is a string: a UTF-8 encoded binary. It's printed surrounded by
  "double quotes" because all UTF-8 encoded code points in it are printable.
Raw representation
  <<97, 98, 99>>
Reference modules
  String, :binary
Implemented protocols
  Collectable, IEx.Info, Inspect, List.Chars, String.Chars

We can see here that the raw representation is <<97, 98, 99>> - i.e. its a binary with code points 97, 98, 99.

Since hd doesn’t work we can get the first element of the string using

> String.first(s)
"a"

We can get the integer representation of a character using

> ?a
97
> ?b
98
> ?c
99

We can check the code points in the string

> String.codepoints(s)
["a", "b", "c"]

And we can get a list of the integer codes of each character using

> String.to_charlist(s)
'abc'

Note here that we get back a single-quoted string. Although this looks like a string it’s actually a character list.

We can call hd on it

> String.to_charlist(s) |> hd
97

And we can’t use it with a function that expects a binary string

> String.to_charlist(s) |> String.first
** (FunctionClauseError) no function clause matching in String.first/1

    The following arguments were given to String.first/1:

        # 1
        'abc'

    Attempted function clauses (showing 1 out of 1):

        def first(string) when is_binary(string)

    (elixir 1.12.3) lib/string.ex:1876: String.first/1

Strings as character lists

Strings can also be represented as lists of characters. This is where things can get confusing if you’re not expecting it. If you create a string with ' you’ll get a character list. This is a list of the individual character codes.

> l = 'abc'
'abc'
iex(41)> hd l
97
iex(42)> l
'abc'
iex(43)> i l
Term
  'abc'
Data type
  List
Description
  This is a list of integers that is printed as a sequence of characters
  delimited by single quotes because all the integers in it represent printable
  ASCII characters. Conventionally, a list of Unicode code points is known as a
  charlist and a list of ASCII characters is a subset of it.
Raw representation
  [97, 98, 99]
Reference modules
  List
Implemented protocols
  Collectable, Enumerable, IEx.Info, Inspect, List.Chars, String.Chars

So even though we see 'abc' in iex the underlying representation is list of character codes [97, 98, 99]. What’s happening is that when iex sees a list of integers, where each integer is a code for a printable character, then it prints the characters. If we were to add a non-printable character code to the list we would see the underlying integers.

> [123456 | l ]
[123456, 97, 98, 99]

So what if you’re actually working with a list of Integers?

Well elixir will always treat a list of Integers as a list of Integers. But iex may print it as a string, if all the integers are printable. This can be annoying.

You can disable this behaviour with

> IEx.configure(inspect: [charlists: :as_lists])
:ok
iex(51)> 'abc'
[97, 98, 99]

(You can add this to ~/.iex.exs if you always want to treat lists this way)

Converting between binary strings and charlists

As we saw above you’ll get an error if you try to pass a character list on a function that expects a binary string representation and vice-versa.

These are the 2 functions you need to convert between the two representations.

> List.to_string([97, 98, 99])
"abc"
> String.to_charlist("abc")
'abc' # or [97, 98, 99] depending on your iex config

Pattern match on strings

And finally its worth mentioning pattern matching on binary strings. You can use <> to pattern against a binary string.

"ab" <> final_char = "abc"