Strings in Elixir
First, there is no
String type in elixir. Stings don’t get their own type, but are represented using other builtin elixir/erlang types.
There are 2 different string representations in Elixir
- Character lists
These two string representations in Elixir are quite different. You need to be cognisant of the string representation that you are using as this affects the operations that you can perform on the string and how you process it.
Strings as binaries
If you create a string using
" the string is represented as a UTF-8 encoded binary. Most of the common operations you’ll want to do on strings are contained in the in the
String module operate on the binary string representation.
This is generally the string representation you want to use.
Lets create a string and check its a binary
> s = "abc" "abc" > is_binary(s) true
We can call any of the functions from the
String module on this binary
> String.capitalize(s) "Abc" > String.reverse(s) "cba" > String.split(s, "b") ["a", "c"]
We can’t use
hd to get the first element of the string
> hd s ** (ArgumentError) errors were found at the given arguments: * 1st argument: not a nonempty list :erlang.hd("abc")
because this isn’t a list - its a binary.
> i s Term "abc" Data type BitString Byte size 3 Description This is a string: a UTF-8 encoded binary. It's printed surrounded by "double quotes" because all UTF-8 encoded code points in it are printable. Raw representation <<97, 98, 99>> Reference modules String, :binary Implemented protocols Collectable, IEx.Info, Inspect, List.Chars, String.Chars
We can see here that the raw representation is
<<97, 98, 99>> - i.e. its a binary with code points 97, 98, 99.
hd doesn’t work we can get the first element of the string using
> String.first(s) "a"
We can get the integer representation of a character using
> ?a 97 > ?b 98 > ?c 99
We can check the code points in the string
> String.codepoints(s) ["a", "b", "c"]
And we can get a list of the integer codes of each character using
> String.to_charlist(s) 'abc'
Note here that we get back a single-quoted string. Although this looks like a string it’s actually a character list.
We can call
hd on it
> String.to_charlist(s) |> hd 97
And we can’t use it with a function that expects a binary string
> String.to_charlist(s) |> String.first ** (FunctionClauseError) no function clause matching in String.first/1 The following arguments were given to String.first/1: # 1 'abc' Attempted function clauses (showing 1 out of 1): def first(string) when is_binary(string) (elixir 1.12.3) lib/string.ex:1876: String.first/1
Strings as character lists
Strings can also be represented as lists of characters. This is where things can get confusing if you’re not expecting it. If you create a string with
' you’ll get a character list. This is a list of the individual character codes.
> l = 'abc' 'abc' iex(41)> hd l 97 iex(42)> l 'abc' iex(43)> i l Term 'abc' Data type List Description This is a list of integers that is printed as a sequence of characters delimited by single quotes because all the integers in it represent printable ASCII characters. Conventionally, a list of Unicode code points is known as a charlist and a list of ASCII characters is a subset of it. Raw representation [97, 98, 99] Reference modules List Implemented protocols Collectable, Enumerable, IEx.Info, Inspect, List.Chars, String.Chars
So even though we see
'abc' in iex the underlying representation is list of character codes
[97, 98, 99]. What’s happening is that when iex sees a list of integers, where each integer is a code for a printable character, then it prints the characters.
If we were to add a non-printable character code to the list we would see the underlying integers.
> [123456 | l ] [123456, 97, 98, 99]
So what if you’re actually working with a list of Integers?
Well elixir will always treat a list of Integers as a list of Integers. But
iex may print it as a string, if all the integers are printable. This can be annoying.
You can disable this behaviour with
> IEx.configure(inspect: [charlists: :as_lists]) :ok iex(51)> 'abc' [97, 98, 99]
(You can add this to
~/.iex.exs if you always want to treat lists this way)
Converting between binary strings and charlists
As we saw above you’ll get an error if you try to pass a character list on a function that expects a binary string representation and vice-versa.
These are the 2 functions you need to convert between the two representations.
> List.to_string([97, 98, 99]) "abc" > String.to_charlist("abc") 'abc' # or [97, 98, 99] depending on your iex config
Pattern match on strings
And finally its worth mentioning pattern matching on binary strings. You can use
<> to pattern against a binary string.
"ab" <> final_char = "abc"