Python String Data Type Tutorial
In this tutorial we learn about strings as immutable collections of Unicode characters.
We cover their quoting, escape characters, concatenation, and why f-strings are the best formatting to use.
What is a string?
The string data type is one of the collection data types in Python. A string is an immutable collection of Unicode characters.
How to declare/initialize a string
Python allows us to easily declare strings by just wrapping letters, words or sentences in either single or double quotes.
variable_name = 'string value'
# or
variable_name = "string value"
message = 'Hello World'
print(message)
The string collection
As mentioned before, a string is a collection of single characters.
In some programming languages like C , we can’t use a string and have to define arrays of characters explicitly.
#include <stdio.h>
int main()
{
char message[] = "Hello World";
printf(message);
return 0;
}
In Python we can define a string directly. However, a string is still a collection of characters under the hood.
Think of a string as a row in a table with each character in its own separate cell.
If we consider the word “Hello”, this is what it would look like:
H | e | l | l | o |
Each character is also mapped to an index, a number that represents its position in the string.
Considering the word “Hello” again, this is what it would look like:
0 | 1 | 2 | 3 | 4 |
---|---|---|---|---|
H | e | l | l | o |
How to access characters in a string with the indexer
We can use the index number of a character to access its value. We specify the index number of the character we want to access between [ ] (open and close square brackets).
string_variable_name[index]
message = 'Hello'
print("Character 1: ", message[0])
print("Character 2: ", message[1])
print("Character 3: ", message[2])
print("Character 4: ", message[3])
print("Character 5: ", message[4])
In the example above we access each character of the string individually by using its index in the collection.
note The index number of an indexed collection will always start at 0.
String Quotes
As mentioned, we may use both single or double quotes, but we may not use them together in the same initialization.
message = 'Hello" #SyntaxError
A string may also not be initialized without quotes.
message = Hello #NameError
Single vs double quotes. When to use which
Both single and double quotes are often used inside strings. If we don’t want to escape quote characters inside the string, we can simply use the opposite quotes as the string wrapper.
# The following string has a single quote
# inside and so is enclosed in double quotes
print("This string is valid because it's enclosed in quotes.")
In the example above, the string is enclosed with double quotes so we don’t need to explicitly escape the single quote.
# The following string has double quotes inside
# and so is enclosed within single quotes
print('"One, two, five!" - King Arthur. Monty Python and the Holy Grail')
In the example above, we use double quotes inside the string so we simply wrap the whole string in single quotes.
How to change a string value
Strings are immutable and values cannot be changed at runtime. However, we can assign a new value to the same variable that holds a string.
If we try to change a character inside the string, the interpreter will raise an error.
message = 'Hello'
message[0] = 'Y'
In the example above we try to change the H character to a Y, but because a string is immutable the interpreter raises a TypeError.
TypeError: 'str' object does not support item assignment
If we want to change a string at runtime, we have to overwrite it with a new string completely.
message = "Hello"
print(message)
message = "Greetings"
print(message)
In the example above the old message is discarded and a new message is created with the same variable name but different string value.
How to break a string in source code
Sometimes in our source code we may need to break up a string onto multiple lines. Python doesn’t allow this in the same manner as other conventional languages (like C# ) do.
message = "Hello
world"
If we use the example above in a Python script, the interpreter will produce a SyntaxError.
SyntaxError: EOL while scanning string literal
The interpreter encounters an End Of Line and assumes that the string should be closed there, but it isn’t.
To break a string onto multiple lines in the source code, we use a \ (backslash) where we want the string to break to a new line.
"line 1" \
"line 2"
Both lines in the syntax example above are enclosed with their own quotes.
message = "Hello " \
"World"
print(message)
In the example above the string is broken up into multiple lines in our source code, however, when we print the string it’s still on the same line.
If we wanted to create new lines in print, we would have to use an escape character or triple quotes.
String triple quotes
Python’s triple quotes allow strings to span multiple lines. We can also include tabs and special characters without escaping them.
To initialize a triple quote string we wrap our string in 3 single or double quotes.
"""
This string can span
over multiple lines
"""
message = """
Triple quotes not only allow us to
break a string into new lines, both
in the source code and in print, but
it also allows verbatim tabs and
special characters without escaping
them.
Example:
Tabbed content
@ # $ % ^ & *
"""
print(message)
In the example above, the string is printed exactly as it’s formatted in the source code.
String escape characters
If we’re not using triple quotes, we can escape certain characters with backslash notation.
tab = 'A horizontal \t tab'
print(tab)
The following table lists some of the commonly used escape characters:
Sequence | Description | Example | Output |
---|---|---|---|
\ | Backslash | print(’\’) | \ |
\’ | Single quote ( ‘ ) | print(’\“) | ‘ |
\” | Double quote ( “ ) | print(”\“”) | “ |
\n | Line feed (new line) | print(‘Hello \n World’) | Hello |
World | |||
\t | Horizontal tab | print(“Hello \t World”) | Hello World |
String Concatenation
To combine, or concatenate, multiple strings together, we use the + (plus) operator.
"string" + "string"
a = "Hello "
b = "World"
print(a + b)
In the example above, we leave an extra space at the end of Hello as a separator between the words.
String Formatting
When we want to combine data into a string, Python won’t convert it automatically, we need some sort of string formatting. Fortunately we have several options:
- %-formatting
- .format() function
- f-Strings
As an example let’s look at the following code:
name = "Monty"
age = 30
print('Name: ' + name)
print('Age: ' + age)
When we use the + operator on a string, the interpreter assumes we want to concatenate. And when we use it on an int , the interpreter assumes we want to do arithmetic.
In the example above, the interpreter will get confused and raise a TypeError.
TypeError: can only concatenate str (not "int") to str
This is where string formatting comes to the rescue.
%-formatting
The original method to format a string was with the % (percent) operator. It’s placed within a string at the location we want our data to appear and the interpreter will then replace it with the specified data.
print("%x" % value)
The % operator is followed immediately by a character that denotes the type of data it is a placeholder for.
print("%i" % 30)
In the example above we use an int as a value, so we use %i as the placeholder.
The following table shows the characters to be used in string formatting:
Character | Description |
---|---|
%c | Character |
%s | String conversion via str() prior to formatting |
%i | Signed decimal integer |
%d | Signed decimal integer |
%u | Unsigned decimal integer |
%o | Octal integer |
%x | Hexadecimal integer using lowercase letters |
%X | Hexadecimal integer using uppercase letters |
%e | Exponential notation with lowercase e |
%E | Exponential notation with uppercase e |
%f | Floating point real number |
%g | The shorthand of %f and %e |
%G | The shorthand of %f and %E |
Don't use %-formatting
%-formatting isn’t great because it’s verbose and can lead to errors, like not displaying dictionaries correctly.
Even the official Python documentation recommends not using %-formatting.
.format() function
Python 2.6 introduced a better way to format strings with the .format() function. The placeholder fields are marked with open and close curly braces and the fields we want to replace are then specified as function parameters.
"string {} string {} string".format(replace, replace)
message = "Hello {}, welcome to the Python {} tutorial".format("there", "string")
print(message)
In the example above we replace each instance of the open and close curly braces with a word inside the function’s parameters.
We can also reference variables by using numbers to order them in the string.
"string {2} string {0} string {1}".format(replace_0, replace_1, replace_2)
message = "Hello {1}, welcome to the Python {0} tutorial".format("string", "there")
print(message)
We can go a step further and insert the variable names giving us the perk of passing objects, then referencing their parameters and methods or use ** with dictionaries.
We won’t demonstrate it here, but the point is that the .format() function is definitely a step up from %-formatting.
Don't use the .format() function
The .format() function isn’t great because it is still quite verbose, specially when dealing with multiple parameters in longer strings.
f-Strings
Python 3.6 introduced us to f-Strings, or “formatted string literals”. f-Strings are string literals that have curly braces containing the expressions that will be replaced with their respective values. The expressions are formatted using the __format__ protocol.
The syntax is similar to that of the .format() function but much less verbose. An f-String requires us to prefix the string with the letter f .
variable_name = value
f"string {variable_name} string"
name = "General Kenobi"
trait = "bold"
message = f"{name}. You are a {trait} one."
print(message)
In the example above we specify the variable names that we want to replace with their corresponding values between curly braces.
f-String Expressions
Because f-Strings are evaluated at runtime, we can use any valid Python expression inside them.
print(f"3 + 4 = {3 + 4}")
We can also call functions inside of f-Strings.
def is_instrument(instrument):
if instrument != "Guitar" or instrument != "Piano":
return f"No Patrick, {instrument} is not an instrument"
else:
return "Yes, it is"
print(f"Is Mayonaise an instrument? {is_instrument('Mayonaise')}")
We can even use objects created from classes. However, we won’t look at it here because we haven’t covered classes and objects yet and the code would be too confusing.
Multi-line f-Strings
Just as with regular strings, we can break up an f-string into multiple lines in the source code.
With f-strings we don’t need to use the backslash operator to indicate continuation, we wrap all the strings in parentheses.
(
f"string line 1"
f"string line 2"
f"string line 3"
)
name = "Monty"
language = "Python"
version = 3
topic = "f-Strings"
message = (
f"Hello there {name}. "
f"Welcome to the {language} {version} "
f"{topic} tutorial"
)
print(message)
In the example above, we wrap our f-Strings in open and close parentheses which allows us to break them up into separate lines in the source code.
Each line requires its own wrapping quotes as well as the f prefix.
Use f-Strings instead of other string formatting
f-Strings are not only the best way to format strings but also offer increased speed at runtime.
If you are working with Python 3.6 or later, there is no reason not to use f-Strings instead of the .format() function or even %-formatting.
Summary: Points to remember
- Strings can use '' (single), "" (double) or """ (triple) quotes.
- A string is a collection of characters that can be accessed with the indexer.
- A string is immutable, however we can “overwrite” a string variable with a new string at runtime.
- We can use the \ (backslash) to break strings up into multiple lines in the source code.
- Special characters not inside triple quote strings must be escaped with a \ (backslash).
- Don’t use %-formatting to format strings.
- Use the .format() function to format strings only if you are working with Python 2.6 to 3.5.
- Use f-Strings to format strings if you are working with Python 3.6 and up.