Now In this post will understand “How we can use Strings, Collections, and Iteration in 6 steps” In this module of Core Python: Getting Started, we’ll look at a few of the most important collection types in Python, str, bytes, list, and dict. Now that you have an initial understanding of Python’s fundamental scaler types, you’re ready to start exploring some of the collection types.We’ll introduce you to the for loop, a looping construct commonly used for iterating over collections. We’ll apply all of this to build a small, but useful program that demonstrates the expressiveness of Python.
Python includes a rich selection of collection types, which are often completely sufficient for even quite intricate programs without resorting to defining your own data structures. We’ll give enough of an overview of some fundamental collection types now to allow us to write some interesting code. We’ll also be revisiting each of these collection types together with a few additional ones later in the course. Let’s start with these types, str, bytes, list, and dict. Along the way, we’ll also cover Python’s for loops.
This Articles Contents
String:
Strings in Python have the data type str, spelled s‑t‑r, and we’ve been using them extensively already. Strings are sequences of Unicode code points, and for the most part, you can think of code points as being like characters, although they are not strictly equivalent. The sequence of characters in a Python string is immutable, meaning that once you’ve constructed a string, you can’t modify its contents. Literal strings in Python are delimited by quotes, you could use single quotes or double quotes. You must, however, be consistent.
For example, you can’t use single quotes on one side and double on the other, like this. Supporting both quoting styles allows you to easily incorporate the other quote character into the literal string without resorting to ugly escape character gymnastics. Notice that the REPL exploits the same, quoting flexibility when echoing the strings back to us. Beautiful text strings rendered in literal form, simple elegance. At first sight, support for both quoting styles seems to violate an important principle of Pythonic style from the Zen of Python.
There should be one, and preferably only one, obvious way to do it. In this case, however, another aphorism from the same source, practicality beats purity, takes precedence. The utility of supporting two quoting styles is valued more highly than the alternative, a single quoting style combined with more frequent use of ugly escape sequences, which we’ll encounter shortly.
String Literals
>>> 'This is a string'
'This is a string'
>>> "This is also a string"
'This is also a string'
>>> "inconsistent'
File "<stdin>", line 1
"inconsistent'
Λ
SyntaxError: EOL while scanning string literal >>> "It's a good thing."
>>>"It's a good thing."
It's a good thing.
>>> '"Yes!", he said, "I agree!"'
'"Yes!", he said, "I agree!"'
String Literals:
Adjacent literal strings are concatenated by the Python compiler into a single string, which, although at first it seems rather pointless, can be useful for a nicely formatted code, as we’ll see later. If you want a literal string containing newlines, you have two options, use multi line strings or use escape sequences. First, let’s look at multi line strings. Multi line strings are delimited by three quote characters rather than one. Here’s an example using three double quotes. Notice how, when the string is echoed back to us, the newlines are represented by the \n escape sequence.
We can also use three single quotes. As an alternative to using multiline quoting, we can just embed the control characters ourselves. To get a better sense of what we’re representing, we can use print to see the string. If you’re working on Windows, you might be thinking that newlines should be represented by the carriage return and newline couplet \r\n. There’s no need to do that with Python. Since Python 3 has a feature called Universal Newline Support, which translates from the simple \n to the native newline sequence for your platform on input and output.
Multiline Strings
>>> """This is
... a multiline
... string"""
'This is\na multiline\nstring'
>>> '''So
... is
... this.
'So\nis\nthis.'
>>> m = 'This string\nspans multiple \nlines'
>>> m
'This string\nspans multiple\nlines'
>>> print(m)
This string
spans multiple
lines
>>>
You can read more about Universal Newline Support in PEP 278. We can use the escape sequences for other purposes, too, such as incorporating tabs with \t or allowing us to quote characters within strings by using \ double quote or \ single quote. See how Python is smarter than we are at using the most convenient quote delimiters, although Python will also resort to escape sequences when we use both types of quotes in a string. Because backslash has special meaning, to place a backslash in a string, we escape the backslash with itself. To reassure ourselves that there really is only one backslash in that string, we can print it. You can read more about escape sequences in the Python documentation at python.org.
Sometimes, particularly when dealing with strings such as Windows file system paths or regular expression patterns, which use backslashes extensively, the requirement to double up on backslashes can be ugly and error prone. Python comes to the rescue with its raw strings. Raw strings don’t support any escape sequences and are very much what you see is what you get. To create a raw string, prefix the opening quote with a lowercase r. We can use the string constructor to create string representations of other types such as integers or floats.
Strings in Python are what are called sequence types, which means they support certain common operations for querying sequences. For example, we can access individual characters using square brackets with an integer 0‑based index. Note that in contrast to many programming languages, there is no separate character type distinct from the string type. The indexing operation we just used returns a full‑blown string that contains a single character element, something we can test using Python’s built‑in type function. There will be more on types and classes later. String objects also support a wide variety of operations implemented as methods.
Escape Sequences
>>> "This is a \" in a string"
'This is a " in a string'
>>> 'This is a \' in a string'
"This is a ' in a string"
>>> 'This is a \" and a \' in a string'
'This is a " and a \' in a string'
>>> k = 'A \\ in a string'
>>> k
'A \\ in a string'
>>> print(k)
A in a string
>>>
We can list those methods using help on the string type. Ignore all the hieroglyphics with underscores for now and page down until you see the documentation for the capitalized method. Press Q to quit the help browser, and we’ll try to use that method. First, let’s make a string that deserves capitalization, the proper noun of a capital city, no less. To call methods on objects in Python, we use the dot after the object name and before the method name. Methods are functions, so we must use the parentheses to indicate that the method should be called.
Remember that strings are immutable, so the capitalized method didn’t modify c in place, rather, it returned a new string. We can verify this by displaying c, which remains unchanged. You might like to spend a little time familiarizing yourself with the various useful methods provided by the string type. Finally, because strings are fully Unicode capable, we can use them with international characters easily, even in literals because the default source code encoding for Python 3 is UTF‑8. For example, if you have access to Norwegian characters, you can simply enter this.
Alternatively, you can use the hexadecimal representations of Unicode code points as an escape sequence prefixed by \u, which I’m sure you’ll agree, is somewhat more unwieldy. Similarly, you can use the \x escape sequence, followed by a two‑character hexadecimal string or an escaped octal string to include Unicode characters in a string literal. There are no such Unicode capabilities in the otherwise similar bytes type, which we’ll look at next.
Bytes:
Bytes are very similar to strings, except that rather than being sequences of Unicode code points, they are sequences of, well, bytes. As such, they are used for raw binary data and fixed‑width single‑byte character encoding such as ASCII. As with strings, they have a simple, literal form using quotes, the first of which is prefixed by a lower case b. There is also a bytes constructor, but it’s an advanced feature and we won’t cover it in this fundamentals module.
At this point, it’s sufficient for us to recognise bytes literals, and understand that they support most of the same operations as string, such as indexing, which returns the integer value of the specified byte, and splitting, which you’ll see returns a list of bytes objects. To convert between bytes and strings, we must know the encoding of the byte sequence used to represent the string’s Unicode code points as bytes. Python supports a wide variety of encoding, a full list of which can be found at python.org.
Bytes
>>> b'data'
b'data'
>>> b"data"
b'data'
>>> d = b'some bytes'
>>> d[0]
115
>>> d.split()
[b'some', b'bytes']
>>>
Let’s start with an interesting Unicode string which contains all the characters of the 29‑letter Norwegian alphabet, a pan-gram. We’ll now encode that using UTF‑8 into a bytes object. See how the Norwegian characters have each been rendered as pairs of bytes. We can reverse that process using the decode method of the bytes object. Again, we must supply the correct encoding. We can check that the result is equal to what we started with, and display it for good measure.
This may seem like an unnecessary detail so early in the module, especially if you operate in an anglophone environment, but it’s a crucial point to understand since files and network resources such as HTTP responses are transmitted as byte streams, whereas we often prefer to work with the convenience of Unicode strings.
List:
Python lists, such as those returned by the string split method are sequences of objects. Unlike strings, lists are mutable, insofar as the elements within them can be replaced or removed, and new elements can be inserted or appended. Lists are a workhorse of Python data structures. Literal lists are delimited by square brackets, and the items within the list separated by commas. Here is a list of three numbers and a list of three strings. We can retrieve elements by using square brackets with a zero‑based index, and we can replace elements by assigning to a specific element.
See how lists can be heterogeneous with respect to the types of the objects. We now have a list containing a string, an integer, and another string. It’s often useful to create an empty list, which we can do using empty square brackets. We can modify the list in other ways. Let’s add some floats to the end of the list using the append method. There are many other useful methods for manipulating lists, which we’ll cover in a later module.
Lists
>>> [1, 9, 8]
[1, 9, 8]
>>> a = ["apple", "orange", "pear"]
>>> a[1]
'orange'
>>> a[1] = 7
>>> a
['apple', 7, 'pear'] >>> b = []
>>>b.append(1.618)
>>> b
>>> b.append(1.414)
>>> b
[1.618, 1.414]
>>> list("characters")
['c', 'h', 'a', 'r', 'a', 'c', 't', 'e', 'r', 's']
>>> c = ['bear',
'giraffe',
'elephant',
'caterpillar', ]
>>> C
['bear', 'giraffe', 'elephant', 'caterpillar']
>>>
There’s also a list constructor, which can be used to create lists from other collections such as strings. Finally, although the significant white space rules in Python can at first seem very rigid, there is a lot of flexibility. For example, if at the end of the line brackets, braces or parentheses are enclosed, you can continue on the next line. This can be very useful for long, literal collections, or simply to improve readability. See also how we’re allowed to use an additional comma after the last element. This is an important maintainability feature.
Dict:
Dictionaries are completely fundamental to the way the Python language works and are very widely used. A dictionary maps keys to values, and in other languages, is known as a map or an associative array. Let’s look at how to create and use them in Python. Literal dictionaries are created using curly braces containing key‑value pairs. Each pair is separated by a comma, and each key is separated from the corresponding value by a colon. Here, we use a dictionary to create a simple telephone directory.
We can retrieve items by key using the square brackets operator and update the values associated with the key by assigning through the square brackets. If we assign to a key that has not yet been added, a new entry is created. Be aware that in Python versions prior to 3.7, the entries in the dictionary can’t be relied upon to be stored in any particular order. As of Python 3.7, however, entries are required to be kept in insertion order. Similarly, to lists, empty dictionaries can be created using empty curly braces. We’ll revisit dictionaries in much more detail in a later module
For-loop:
Now that we have the tools to make some interesting data structures, we’ll look at Python’s second type of loop construct, the for loop. For loops in Python correspond to what are called for each loops in many other programming languages. They request items one by one from a collection, or more strictly, from an iterate series, but more on that later, and assign them in turn to a variable that we specify. Let’s create a collection and use a for loop to iterate over it.
or-loop
>>> cities = ["London", "New York", "Paris", "Oslo", "Helsinki"]
>>> for city in cities:
print(city)
London
New York
Paris Oslo
Helsinki
>>> colors = {'crimson': 0xdc143c, 'coral': 0xff7f50, 'teal': 0x008080}
>>> for color in colors:
print(color, colors [color])
crimson 14423100
coral 16744272
teal 32896
>>>
If you iterate over dictionaries, you get the keys, which you can then use within the for loop body to retrieve values. Here we define a dictionary mapping string color names to hexadecimal integer color codes. Note that we used the ability of the built‑in print function to accept multiple arguments. We passed the key and the value for each colour separately. See also how the colour codes returned to us are in decimal.
Putting it all Together:
In this last section, before we summaries, we’re going to write a longer snippet at the REPL. We’re going to fetch some text data for some classic literature from the web using a Python standard library function called url open. To get access to url open, we need to import the function from the request module within the standard library urllib package. Next, we’re going to call url open with a URL to our story, then create an empty list, which ultimately will hold all of the words from the text.
Next, we open a for loop, which will work through the story. Recall that for loops request items one by one from the term on the right of the in keyword, in this case, story, and assign them in turn to the name on the left, in this case, line. It so happens that the type of HTTP response represented by story yields successive lines of text when iterated over in this way. So the for loop retrieves one line of text a time from Dickens’ classic. Note also that the for statement is terminated by a colon because it introduces the body of the for loop, which is a new block and hence a further level of indentation.
For each line of text, we used the split method to divide it into words on white space boundaries, resulting in a list of words we call line words. Now, we use a second for loop nested inside the first to literate over this list of words, appending each in turn to the accumulating story words list. Now, we enter a blank line at the three dots prompt to close all open blocks. In this case, the inner for loop and the outer for loop will both be terminated. The block will be executed, and after a short delay, Python now returns us to the regular triple‑arrow prompt.
At this point, if Python gives you an error, such as a syntax error or indentation error, you should go back, review what you entered, and carefully re‑enter the code until Python accepts the whole block without complaint. If you get an HTTP error, then you were unable to fetch the resource over the internet, and you should try again later, although it’s worth checking that you typed the URL correctly. Finally, now that we’re done reading from the URL, we need to close our handle to it, story. We can look at the words we read simply by asking Python to evaluate the value of story words. Here, we can see the list of words.
Notice that each of the single‑quoted words is prefixed by a lowercase letter B, meaning that we have a list of bytes objects where we would have preferred a list of strings. This is because the HTTP request transferred raw bytes to us over the network. To get a list of strings, we should decode the bytes string each line into Unicode strings. We can do this by inserting a call to the decode method of the bytes object and then operating on the resulting Unicode string. The Python REPL supports a simple command‑line history, and by careful use of the up and down‑arrow keys, we can re‑enter our snippet.
When we get to the line which needs to be changed, we can edit it using the left and right‑arrow keys to insert the requisite call to decode. Then when we rerun the block and take a fresh look at story words, we should see we have a list of strings. We’ve just about reached the limits of what’s possible to comfortably edit at the Python REPL. So in the next course module, we’ll look at how to move this code into a Python module where it can be more easily worked with in a text editor.
Summary:
There are a lot of details in this module, which may be difficult to remember all at once, but which you’ll find you use very frequently when writing Python code. First, we looked at strings, in particular the various forms of quoting for single and multi‑line strings. We saw how adjacent string literals are implicitly concatenated. Python has support for universal newlines. So no matter what platform you’re using, it’s sufficient to use a single backslash n character, safe in the knowledge that it will be appropriately translated from and to the native newline during IO.
- Single- and multi-line literals
- Concatenation of adjacent literals
- Universal newlines
- Escape sequences
- Raw strings
- Use str constructor to convert other types
- Access individual characters with square bracket indexing
- Rich API
- String literals can contain Unicode
Escape sequences provide an alternative means of incorporating new lines and other control characters into literal strings. The backslashes used for escaping can be a hindrance for Windows file system paths or regular expressions, so raw strings with the R prefix can be used to suppress the escaping mechanism. Other types, such as integers, can be converted to strings using the str constructor. Individual characters returned as one character strings can be retrieved using square brackets with integer zero‑based indices. Strings support a rich variety of operations, such a splitting, through their methods.
In Python 3, literal strings can contain Unicode characters directly in the source. The bytes type has many of the capabilities of strings, but it is a sequence of bytes rather than a sequence of Unicode code points. Bytes literals are prefixed with a lowercase b. To convert between string and bytes instances, we use the encode method of str and the decode method of bytes. In both cases, passing the encoding, which we must know in advance.
Lists are mutable, heterogeneous sequences of objects. List literals are delimited by square brackets, and the items are separated by commas. As with strings, individual elements can be retrieved by indexing into a list with square brackets. In contrast to strings, Individual list elements can be replaced by assigning to the indexed item. Lists can be grown by appending to them, and they can be constructed from other sequences using the list constructor. Dictionaries associate keys with values. Literal dictionaries are delimited by curly braces.
The key value pairs are separated from each other by commas, and each key is associated with its corresponding value with a colon. For loops take items one by one from an iterable object, such as a list, and binds a name to the current item. They correspond to what are called for‑each loops in other languages. In the next module of Core Python: Getting Started, we’ll look at functions and modules, Pythons fundamental tools for organizing code. These tools will facilitate writing larger programs and structure in your code into cohesive, reusable components.