11. Strings and Text Data

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

11 Strings and Text Data

Introduction

Most data in the world can be stored as text and strings. Even values that may eventually be numeric data may initially come in the form of text. It’s important to be able to work with text data. This chapter won’t be specific to Pandas. That is, we will mainly explore how you manipulate strings within Python without Pandas. The following chapters will cover some more Pandas materials. Then we will come back to strings and see how it all ties back with Pandas. As an aside, some of the string examples in this chapter come from Monty Python and the Holy Grail.

Learning Objectives

Recall how to subset containers and sequences
Recognize strings are a type of container object
Modify strings based on use case
Create regular expression patterns to match strings
Combine pose text with code output into a single sentence

11.1 Strings

In Python, a string is simply a series of characters. They are created by a set of opening and matching single or double quotes. Below are two strings, grail and a scratch. These strings are assigned to the variables word and sent, respectively.

word = 'grail'
sent = 'a scratch'

So far in this book, we have seen strings in a column represented as the object dtype.

11.1.1 Subset and Slice Strings

A string can be thought of as a container of characters. You can subset a string like any other Python container (e.g., list or Series).

Table 11.1 and Table 11.2 show the strings with their associated index. This information will help you understand the examples in which we slice values using the index.

Table 11.1 Index Positions for the String "grail"

index	0	1	2	3	4
string	g	r	a	i	l
neg index	–5	–4	–3	–2	–1

Table 11.2 Index Positions for the String "a scratch"

index	0	1	2	3	4	5	6	7	8
string	a		s	c	r	a	t	c	h
neg index	–9	–8	–7	–6	–5	–4	–3	–2	–1

11.1.1.1 Single Letter

To get the first letter of our strings, we can use the square bracket notation, [ ]. This notation is the same method we used in Section 1.3 when we looked at various slices of data.

print(word[0])

print(sent[3])

11.1.1.2 Slice Multiple Letters

Alternatively, we can use slicing notation (Appendix L) to get ranges from our strings.

Method	Description
`.capitalize()`	Capitalizes the first character
`.count()`	Counts the number of occurrences of a string
`.startswith()`	`True` if the string begins with specified value
`.endswith()`	`True` if the string ends with specified value
`.find()`	Smallest index of where the string matched, `-1` if no match
`.index()`	Same as find but returns `ValueError` if no match
`.isalpha()`	`True` if all characters are alphabetic
`.isdecimal()`	`True` if all characters are decimal numbers (see documentation as well as `.isdigit()`, `.isnumeric()`, and `.isalnum()`)
`.isalnum()`	`True` if all characters are alphanumeric (alphabetic or numeric)
`.lower()`	Copy of a string with all lowercase letters
`.upper()`	Copy of string with all uppercase letters
`.replace()`	Copy of a string with the `old` values replaced with `new`
`.strip()`	Removes leading and trailing whitespace; also see `lstrip` and `rstrip`
`.split()`	Returns a list of values split by the delimiter (separator)
`.partition()`	Similar to `split(maxsplit=1)` but also returns the separator
`.center()`	Centers the string to a given width
`.zfill()`	Copy of string left filled with `'0'`

Code	Results
`"black Knight".capitalize()`	`'Black knight'`
`"It's just a flesh wound!".count('u')`	`2`
`"Halt! Who goes there?".startswith('Halt')`	`True`
`"coconut".endswith('nut')`	`True`
`"It's just a flesh wound!".find('u')`	`7`
`"It's just a flesh wound!".index('scratch')`	`ValueError`
`"old woman".isalpha()`	`False (there is a whitespace)`
`"37".isdecimal()`	`True`
`"I'm 37".isalnum()`	`False (apostrophe and space)`
`"Black Knight".lower()`	`'black knight'`
`"Black Knight".upper()`	`'BLACK KNIGHT'`
`"flesh wound!".replace('flesh wound', 'scratch')`	`'scratch!'`
`" I'm not dead. ".strip()`	`"I'm not dead."`
`"NI! NI! NI! NI!".split(sep=' ')`	`['NI!', 'NI!', 'NI!', 'NI!']`
`"3,4.partition(',')`	`('3', ',', '4')`
`"nine".center(width=10)`	`' nine '`
`"9".zfill(with=5)`	`'00009'`

Syntax	Description
`.`	Matches any one character
`^`	Matches from the beginning of a string
`$`	Matches from the end of a string
`*`	Matches zero or more repetitions of the previous character
`+`	Matches one or more repetitions of the previous character
`?`	Matches zero or one repetition of the previous character
`{m}`	Matches `m` repetitions of the previous character
`{m,n}`	Matches any number from `m` to `n` of the previous character
	Escape character
`[ ]`	A set of characters (e.g., `[a-z]` will match all letters from a to z)
`\|`	OR; `A \| B` will match `A` or `B`
`( )`	Matches the pattern specified within the parentheses exactly

Sequence	Description
`d`	A digit
`D`	Any character NOT a digit (opposite of `d`)
`s`	Any whitespace character
`S`	Any character NOT a whitespace (opposite of `s`)
`w`	Word characters
`W`	Any character NOT a word character (opposite of `w`)

Table of Contents for 11. Strings and Text Data

Create new playlist

Sign In

Sign Up

11

Strings and Text Data

Introduction

Learning Objectives

11.1 Strings

11.1.1 Subset and Slice Strings

11.1.1.1 Single Letter

11.1.1.2 Slice Multiple Letters

11.1.1.3 Negative Numbers

11.1.2 Get the Last Character in a String

11.1.2.1 Slice from the Beginning or to the End

11.1.2.2 Slice Increments (Steps)

11.2 String Methods

11.3 More String Methods

11.3.1 Join

11.3.2 Splitlines

11.4 String Formatting (F-Strings)

11.4.1 Formatting Numbers

11.5 Regular Expressions (RegEx)

11.5.1 Match a Pattern

11.5.2 Remember What Your RegEx Patterns Are

11.5.3 Find a Pattern

11.5.4 Substitute a Pattern

11.5.5 Compile a Pattern

11.6 The regex Library

Conclusion

Table of Contents for
11. Strings and Text Data