4 Working with strings

4.1 Learning objectives

Transform strings
Combine strings
Split strings
Subset strings

4.2 Dealing with messy or unstructured data

In the last chapter, we focused on importing data into tibbles and then reshaping them to fit the tidy data criteria. In most cases, we had data with some structure, which we transformed into a different structure. This week, we look at working with strings for three reasons: cleaning messy data, filtering rows based on part of string matches, and extracting data from text.

4.2.1 Cleaning messy data

Sometimes, you may have data with a correct tidy structure, but the data itself is not clean and contains errors, unnecessary characters, or unwanted spelling or formatting variants. We need to clean that data before we can produce our analysis or report. Here is an example:

full_names (messy)	full_names (tidy)
Colin Conrad, PhD	Colin Conrad
MACDONALD, Betrum	Bertrum MacDonald
Dr. Louise Spiteri	Louise Spiteri
Mongeon, Philippe	Philippe Mongeon
jennifer grek-martin	Jennier Grek-Martin

4.2.2 Filtering rows based on string matches

In the last chapter, we learned how to filter rows of a tibble based on the value contained in a cell or based on the row number. This week, we will add to our toolbox some string matching functions that check if a string of characters is found within a larger string of characters. One example could be retrieving a set of course codes starting with INFO or MGMT in a vector containing the course codes of all offerings of Dalhousie University.

4.2.3 Extracting data from text

Sometimes, you may have to deal with unstructured data such as a long character string containing data elements we wish to extract. This string, for example:

I am taking several courses offered at SIM this Winter. There is INFO6270 (Introduction to Data Science) and also INFO6540 and the information policy one, which I think has the course code INFO6610.

Maybe you had the brilliant idea to use a free text field in a survey to collect information about the courses that students are taking this Winter, and you now have three thousand responses that look like this one. This unstructured data needs to be structured before it can be analyzed, and in this specific example, and R can help! This kind of task can be relatively simple but can get quite complex. In this chapter, we will not do very complex data extractions from strings.

4.3 The stringr package

The stringr package (https://stringr.tidyverse.org) is part of the tidyverse and contains a collection of functions that perform all kinds of operations on strings. Let’s go through some of those tasks and some code examples.

4.3.1 Transforming strings

4.3.1.1 change string character case

One simple transformation you may want to perform on a string is changing its case. This is very easily done with the str_to_lower(), str_to_upper(), str_to_sentence(), and str_to_title() functions.

Statement	Output
str_to_lower(“HeLlO WoRlD!”)	hello world!
str_to_upper(“HeLlO WoRlD!”)	HELLO WORLD!
str_to_sentence(“HeLlO WoRlD!”)	Hello world!
str_to_title(“HeLlO WoRlD!”)	Hello World!

4.3.1.1.1 Vector example

# I create a vector with character strings
vector <- c("I like coding with R","i like coding in R","R IS AMAZING!","I LoVe R")

# I convert them all to lowercase.
str_to_lower(vector)

[1] "i like coding with r" "i like coding in r"   "r is amazing!"       
[4] "i love r"

4.3.1.1.2 Tibble example

# I create a tibble with inconsistent strings
t <- tibble(comments = c("I like coding with R","i like coding in R","R IS AMAZING!","I LoVe R"))

# I use the mutate() and str_to_lower function to modify the messy column and make the strings consistent. 
t %>% 
  mutate(comments = str_to_lower(comments))

# A tibble: 4 × 1
  comments            
  <chr>               
1 i like coding with r
2 i like coding in r  
3 r is amazing!       
4 i love r

4.3.1.2 Replacing parts of strings

The functions str_replace() and str_replace_all() modify strings by replacing a pattern with another. The difference between the two is that str_replace() will only replace the first instance of the pattern in the string, while str_replace_all() will replace all the instances.

4.3.1.2.1 Vector example

# I create a vector with two strings.
names <- c("dr Mike Smit","dr Sandra Toze")

# I replace the first instance of the pattern "dr" with "doctor". 
names %>% 
  str_replace("dr","doctor")

[1] "doctor Mike Smit"   "doctor Sandra Toze"

Let’s see what happens if I use the same example but use str_replace_all() instead of str_replace().

# I create a vector with two strings.
names <- c("dr Mike Smit","dr Sandra Toze")

# I replace ALL instances of the pattern "dr" with "doctor". 
names %>% 
  str_replace_all("dr","doctor")

[1] "doctor Mike Smit"       "doctor Sandoctora Toze"

The second string got messed up because the second “dr” pattern in Sandra also got replaced with the pattern “doctor”.

4.3.1.3 Removing parts of strings

The str_remove() and str_remove_all() are the equivalent of str_replace("some pattern", "") and str_replace_all("some pattern", ""). They can make our code a little cleaner by not requiring that we specify that we want to replace a given pattern with nothing.

# I create a vector with names
names <- c("dr Mike Smit","dr Sandra Toze")

# I remove the first instance of the pattern "dr" from the names.
names %>% 
  str_remove("dr")

[1] " Mike Smit"   " Sandra Toze"

4.3.1.3.1 Tibble example

# I create a tibble with professor names.
t <- tibble(names = c("dr Mike Smit","dr Sandra Toze"))

# I remove all instance of the pattern "dr" in the names. 
t %>% 
  mutate(names = str_remove_all(names, "dr"))

# A tibble: 2 × 1
  names       
  <chr>       
1 " Mike Smit"
2 " Sana Toze"

We can see that again removing all the “dr” patterns from the strings caused a problem because the pattern is also found in the name “Sandra”.

4.3.2 Removing extra spaces

The str_squish() function is a quick and easy way to remove unwanted spaces before or after a string, as well as consecutive spaces within a string.

messy_string <- "   My cat just    stepped on the spacebar  as I was writing this      "

# Let's print the string to see what it looks like
messy_string

[1] "   My cat just    stepped on the spacebar  as I was writing this      "

# Let's squish it!
str_squish(messy_string)

[1] "My cat just stepped on the spacebar as I was writing this"

The str_trim()function is similar to str_squish() but allows you to specify which types of extra spaces you wish to remove. However, it only handles trailing spaces at the beginning or end of strings and cannot remove extra spaces extra spaces in the middle of a string.

string <- "  hello   world    "

# remove spaces at the beginning
string %>% 
  str_trim("left")

[1] "hello   world    "

# remove spaces at the end
# remove spaces at the beginning
string %>% 
  str_trim("right")

[1] "  hello   world"

# remove spaces at the beginning and at the end
string %>% 
  str_trim("both")

[1] "hello   world"

4.3.3 Combine strings

We already learned how to use the unite() function of the tidyr package to concatenate multiple data frame columns into one. However, the unite() function works only with data frames as input, which can be limiting. The stringr package offers a str_c() function that works with vectors, so it’s good to know how to use both functions.

4.3.3.0.1 Vector example

# I create a vector with first names
first_names = c("Bertrum", "Colin", "Louise")

# I create a vector with last names
last_names = c("MacDonald", "Conrad", "Spiteri")

# I combined my vectors into a new vector with full names
full_names <- str_c(first_names, last_names, sep = " ")

# I print the vector
print(full_names)

[1] "Bertrum MacDonald" "Colin Conrad"      "Louise Spiteri"

Another advantage of the str_c() over the unite() function is that it is more flexible in terms of the strings that get concatenated. You could combine the content of two vectors and add any pattern you want to any string.

# I create a tibble with two columns containing first and last names.
my_tibble = tibble(first_name = c("Bertrum", "Colin", "Louise"),
               last_name = c("MacDonald", "Conrad", "Spiteri"))

# I add a column to my tibble with full_names
my_tibble %>% 
  mutate(full_name = str_c(first_name, last_name, sep=" "))

# A tibble: 3 × 3
  first_name last_name full_name        
  <chr>      <chr>     <chr>            
1 Bertrum    MacDonald Bertrum MacDonald
2 Colin      Conrad    Colin Conrad     
3 Louise     Spiteri   Louise Spiteri

# I add a column to my tibble with full_names and include the Dr. pattern at the beginning of the name.
my_tibble %>% 
  mutate(full_name = str_c("Dr.", first_name, last_name, sep=" "))

# A tibble: 3 × 3
  first_name last_name full_name            
  <chr>      <chr>     <chr>                
1 Bertrum    MacDonald Dr. Bertrum MacDonald
2 Colin      Conrad    Dr. Colin Conrad     
3 Louise     Spiteri   Dr. Louise Spiteri

4.3.4 Splitting strings

The str_split() function does the same thing as the separate() function that we learned about in chapter 3. They have slightly different syntax and arguments, but the main difference between the two functions is that str_split() works with vectors and returns a list, while separate() works with data frames and returns a data frame. In other words, if you want to split a string contained in a data frame column, you need to use separate(), and if you want to split a character vector into a list of character vectors. the n argument of str_split() allows us to specify the length of the returned vector. The basic syntax is str_split(character_vector, separator).

courses = c("INFO5500, INFO6540, INFO6270",
            "INFO5500",
            "INFO5530, INFO5520")

# str_split separates the vectors based on a specified delimiter.
# the outcome is a list of three vectors with 3, 1 and 2 elements.
courses %>% 
  str_split(", ")

[[1]]
[1] "INFO5500" "INFO6540" "INFO6270"

[[2]]
[1] "INFO5500"

[[3]]
[1] "INFO5530" "INFO5520"

We can also specify the maximum number of pieces we want to split the string into.

# Here I split the courses vector into a list of vectors that can have a maximum of 2 elements.
courses %>% 
  str_split(", ",n=2)

[[1]]
[1] "INFO5500"           "INFO6540, INFO6270"

[[2]]
[1] "INFO5500"

[[3]]
[1] "INFO5530" "INFO5520"

We can also specify the exact number of pieces we want to split the string into with str_split_fixed(). This function does not return a vector but a matrix.

# I split the courses vector into a matrix with 4 columns.
courses %>% 
  str_split_fixed(", ",n=4)

     [,1]       [,2]       [,3]       [,4]
[1,] "INFO5500" "INFO6540" "INFO6270" ""  
[2,] "INFO5500" ""         ""         ""  
[3,] "INFO5530" "INFO5520" ""         ""

4.3.4.1 str_flatten

The str_flatten() function takes a character vector of length x and concatenates all the elements into a character vector of length 1 (a single string) with a specified separator between the elements. In a sense, it is the opposite of a str_split(). It’s basic syntax is str_flatten(vector, separator)

4.3.4.1.1 Vector example

x <- c("a","b","c")
str_flatten(x,"|")

[1] "a|b|c"

4.3.4.1.2 tibble example

Using str_flatten() in a tibble is tricky (we need to use the group_by() function that we briefly mentioned in the previous chapter but haven’t thoroughly explored yet) but also counterintuitive since it likely means that we are taking a tibble in a tidy format and making it untidy.

# Here is a tibble
my_tibble <- tibble(instructor = c("Mongeon, Philippe", "Mongeon, Philippe", "Mongeon, Philippe","Spiteri, Louise","Spiteri, Louise"),
            course = c("INFO5500","INO6540","INFO6270","INFO6350","INFO6480"))

print(my_tibble)

# A tibble: 5 × 2
  instructor        course  
  <chr>             <chr>   
1 Mongeon, Philippe INFO5500
2 Mongeon, Philippe INO6540 
3 Mongeon, Philippe INFO6270
4 Spiteri, Louise   INFO6350
5 Spiteri, Louise   INFO6480

Now I want to flatten my course column so that I have all the courses taught by the same instructor in a single row and separated with a “|”.

my_tibble %>% 
  group_by(instructor) %>% 
  mutate(course = str_flatten(course, " | ")) %>% 
  unique()

# A tibble: 2 × 2
# Groups:   instructor [2]
  instructor        course                       
  <chr>             <chr>                        
1 Mongeon, Philippe INFO5500 | INO6540 | INFO6270
2 Spiteri, Louise   INFO6350 | INFO6480

Important

The unique() function at the end of the previous code removes the duplicates that are typically created with the str_flatten() function. You can try it yourself and see what happens when you don’t include the unique() step at the end.

4.3.5 Subsetting strings

4.3.5.1 str_sub

We can retrieve, for example, the first three characters of a string (e.g., a postal code) with the str_sub() function. It’s basic syntax is str_sub(string, start, end)

4.3.5.1.1 Vector example

postal_code <- "B3H 4R2"

# get the first three characters of the postal code
postal_code %>% 
  str_sub(1,3)

[1] "B3H"

You can also retrieve the last characters of the string using negative numbers. Let’s get the last three characters of the postal code.

postal_code %>% 
  str_sub(-3,-1)

[1] "4R2"

4.3.5.1.2 Tibble example

# I create my tibble 
t <- tibble(postal_code = c("B3H 4R2", "B3H 7K7"))

# I print my tibble
t

# A tibble: 2 × 1
  postal_code
  <chr>      
1 B3H 4R2    
2 B3H 7K7

# I add two new columms with the first three digits and the last 3 digits of the postal code. 
t <- t %>% 
  mutate(first_three_digits = str_sub(postal_code, 1, 3),
         last_three_digits = str_sub(postal_code, -3, -1)) 

# I print my new tibble
t

# A tibble: 2 × 3
  postal_code first_three_digits last_three_digits
  <chr>       <chr>              <chr>            
1 B3H 4R2     B3H                4R2              
2 B3H 7K7     B3H                7K7

Noticed how I created two new columns with the same mutate()? You can mutate as many things as you want in a single mutate() function. You simply need to add a comma to separate each mutation.

4.3.5.2 str_subset

The str_sub() function should not be confused with the str_subset() functions that returns the element of a vector that contain a string. It’s basic syntax is str_subset(character_vector, string_to_find)

# I create a vector with course codes
course_codes <- c("INFO5500", "BUSI6500", "MGMT5000", "INFO6270")

# I print a vector of course codes that contain the pattern "INFO"
str_subset(course_codes, "INFO")

Caution

Note that you should not try to use the str_subset() function with a tibble. It is possible, but requires the combination of multiple functions, and it’s not something that you are likely to need to do anyways.

4.3.6 Locating a pattern in a string

The str_locate() function allows you to find the position of a pattern in a string. This can be useful, for instance, in combination with str_sub() if you want to extract the part of a string that comes before or after the pattern. Let’s explore the str_locate() function with a few examples.

4.3.6.0.1 Vector examples

# I create a string with an email
email <- "info@somewebsite.ca"

# I locat the @ character
email %>% 
  str_locate("@")

     start end
[1,]     5   5

You can see that str_locate() returns a matrix with the beginning and the end of the “@” pattern in the email. If we want to get the part of the strings that come before the “@”, then we can do this:

# I get the first part of the email
str_sub(email, 1,str_locate(email,"@")[,1]-1)

[1] "info"

We did three things there:

We used 1 as the first argument of str_sub() to specify that we want to extract a subset of the email starting with the 1st character.
We used [,1] to obtain the first column in the matrix, which is where our pattern starts (the 5th position).
We subtracted 1 because we don’t want to print characters 1 to 5, which would be “info@” but characters 1 to 4.

So our statement, in English, would read like this: “extract the subset of the email string that starts at the first position and ends one position before where the”@” pattern is located”.

We can get the part that comes after the pattern “@” like this:

email %>% 
  str_sub(str_locate(email,"@")[,2]+1,-1)

[1] "somewebsite.ca"

This reads as “give me the subset of the email string that starts one position after the location of the”@” pattern (str_locate(email,"@")[,2]+1), and ends with the last character of the string (-1)“. Note that the”,-1” part is optional since, by default, the str_sub() function will output the rest of the string when no end position is provided.

4.3.6.0.2 Tibble example

Let’s just repeat the same example but working with a tibble.

# We create a tibble than contains some emails
my_tibble <- tibble(emails = c("info@somewebsite.ca","support@datascienceisfun.com"))

# We print the tibble
print(my_tibble)

# A tibble: 2 × 1
  emails                      
  <chr>                       
1 info@somewebsite.ca         
2 support@datascienceisfun.com

# We remove the part of the emails after the @
my_tibble %>% 
  mutate(emails = str_sub(emails, 1, str_locate(emails,"@")[,1]-1))

# A tibble: 2 × 1
  emails 
  <chr>  
1 info   
2 support

# We remove the part of the emails before the @
my_tibble %>% 
  mutate(emails = str_sub(emails, str_locate(emails,"@")[,2]+1))

# A tibble: 2 × 1
  emails              
  <chr>               
1 somewebsite.ca      
2 datascienceisfun.com

# Let's make this a bit more complex, and print only the part between the "@" and the "."
my_tibble %>% 
  mutate(emails = str_sub(emails, # strint to subset
                          str_locate(emails,"@")[,2]+1, # starting position
                          str_locate(emails,"\\.")[,1]-1)) # ending position

# A tibble: 2 × 1
  emails          
  <chr>           
1 somewebsite     
2 datascienceisfun

4.3.7 Testing strings

Rather than extracting parts of strings, or modifying strings, you may just want to test to see if a strings contains a specific pattern and get a logical (TRUE, FALSE) in return.

4.3.7.1 str_detect

The str_detect() function allows us to identify strings that contain a specific pattern. It’s syntax is str_detect(character_vector, string_to_detec).

4.3.7.1.1 Vector example

postal_code = c("B3H 1H5","B3H 382","H2T 1H2","J8P 9R2")
str_detect(postal_code, "B3H")

[1]  TRUE  TRUE FALSE FALSE

This can be useful if we want to filter a tibble based on pattern matches. Here’s an example where we have a list of postal codes and would like to keep only those who are in Halifax.

4.3.7.1.2 Tibble example

# I create a tibble with postal codes
my_tibble <- tibble(postal_code = c("B3H 1H5","B3H 382","H2T 1H2","J8P 9R2"))

# I print the rows that for which the postal code contains the pattern "B3H"
my_tibble %>% 
  filter(str_detect(postal_code,"B3H"))

# A tibble: 2 × 1
  postal_code
  <chr>      
1 B3H 1H5    
2 B3H 382

4.3.7.2 str_starts and str_ends

The str_starts() and str_ends() functions do the same thing as str_detect(), but look for the pattern specifically at the beginning or the end of the strings.

# I create a tibble with postal codes
t <- tibble(postal_code = c("B3H 1H5","B3H 382","H2T 1H2","J8P 9R2"))

# I print the postal codes that begin with "B3H"
t %>% 
  filter(str_starts(postal_code, "B3H"))

# A tibble: 2 × 1
  postal_code
  <chr>      
1 B3H 1H5    
2 B3H 382

# I print the postal codes that end with "1H2"
t %>% 
  filter(str_ends(postal_code, "1H2"))

# A tibble: 1 × 1
  postal_code
  <chr>      
1 H2T 1H2

4.3.8 Regular expressions (regex)

Regular expressions are a powerful way to search for patterns in text. A full understanding of regex is far beyond the scope of this course, but you should at least be aware of them. Below is a very superficial introduction to regular expressions. The cheat sheet for the stringr (https://github.com/rstudio/cheatsheets/blob/main/strings.pdf) package is a great place to look for guidance on using regular expressions (as well as all other functions in the stringr package, several of which that I didn’t mention in this chapter but might still be useful). It shows a list of the basic character classes, and all the operators that you can use to search for patterns in strings, so remember that it’s there to help you.

4.3.8.1 Literal expressions

In the code examples above, we used several functions of the stringr package to search for patterns in strings (e.g., searching for the pattern “INFO” in a vector of strings.). “INFO” is a literal expression. We can also search for more than one pattern combined with the Boolean operator OR (represented by “|” in a search pattern).

4.3.8.2 Character classes

Character classes allow you to search for a range of characters or types of patterns using character classes (e.g., numbers, punctuation, symbols, letters, or a user specified set or range of characters). These classes are represented by square brackets “[ ]”.

4.3.8.2.1 Example: remove unwanted characters from strings

You can use regular expressions to filter out of a string all the non-alphanumeric characters like this:

messy_string <- " what-is%going*on/with!my(keyboard)"

messy_string %>% 
  str_replace_all("[^[:alnum:]]"," ")

[1] " what is going on with my keyboard "

Why does this work? Because:

[:alnum:] is a character class containing all characters that are alphabetical or numerical (letters and numbers).
[^] means everything but.

So the statement reads: replace everything but alphanumeric characters with a space.

4.3.8.2.2 Example: find sequences of character belonging to specific classes

We can search for specific sequences of character classes, which can be useful to retrieve things like postal codes from a string.

# We create a vector with an address
address <- c("5058 King St, Halifax, NS H2T 1J2","427 Queen Avernue, Halifax, NS, B3H1H4") 

# We extract the postal code from the address  
address %>% 
  str_extract("[:alpha:][:digit:][:alpha:] ?[:digit:][:alpha:][:digit:]")

[1] "H2T 1J2" "B3H1H4"

The pattern [:alpha:][:digit:][:alpha:] reads as “any letter, followed by any number, followed by any letter”. The [:digit:][:alpha:][:digit:] patters reads as any number, followed by any letter, followed by any number.

You might have noticed that then there is a space and a question mark between my two sets of three character classes. This reads as 0 or 1 space (see the quantifiers section in the stringr cheatsheet). This allows queries to extract postal codes that are written with no space between the two sets of three characters.

4.3.8.2.3 Example: search for spelling variants

Another convenient way of using character classes is when you want to match a word in a text that is or isn’t capitalized. Here’s an example.

# We create a tibble with 2 strings
my_tibble <- tibble(text = c("Information management is great", "I love information management", "Wayne Gretzy was the best hockey player of all times"))

# We print the tibble
print(my_tibble)

# A tibble: 3 × 1
  text                                                
  <chr>                                               
1 Information management is great                     
2 I love information management                       
3 Wayne Gretzy was the best hockey player of all times

# We select the texts that contain "information management" or "Information management".
my_tibble %>% 
  filter(str_detect(text, "[Ii]nformation management"))

# A tibble: 2 × 1
  text                           
  <chr>                          
1 Information management is great
2 I love information management

4.3.8.2.4 Example: combining multiple search terms with “|” (boolean OR)

Instead of using character classes, we could combine multiple search teams with the “|” that represents the Boolean operator OR.

my_tibble %>% 
  filter(str_detect(text, "information management|Information management"))

# A tibble: 2 × 1
  text                           
  <chr>                          
1 Information management is great
2 I love information management

This works, but even with just two variants, you can already tell that it makes longer statements to write.

4.3.8.2.5 Example: searching for a range of character

# I create a tibble containing letters from a to g
my_tibble <- tibble(letters = c("a","b","c","d","e","f","g"))

# I retrieve rows that contain letters from a to f
my_tibble %>% 
  filter(str_detect(letters,"[a-f]"))

# A tibble: 6 × 1
  letters
  <chr>  
1 a      
2 b      
3 c      
4 d      
5 e      
6 f

Again, we could have used “a|b|c|d|e|f” but this is less efficient. Here’s a similar example where we have lowercase and uppercase letters.

# I create a tibble containing letters from a to g in lowercase and uppercase.
my_tibble <- tibble(letters = c("a","b","c","d","e","f","g",
                        "A","B","C","D","E","F","G"))

# I retrieve rows that contain the letters a to d in lowercase or uppercase
my_tibble %>% 
  filter(str_detect(letters, "[a-dA-D]"))

# A tibble: 8 × 1
  letters
  <chr>  
1 a      
2 b      
3 c      
4 d      
5 A      
6 B      
7 C      
8 D

4.3.8.3 Beware of the dot, it’s a wild card

When matching character patterns, the “.” means any character.

string <- "This is a string" 

# I extract every character
str_extract_all(string, ".")[[1]]

 [1] "T" "h" "i" "s" " " "i" "s" " " "a" " " "s" "t" "r" "i" "n" "g"

# I replace every character with a space
string %>% 
  str_replace_all("."," ")

[1] "                "

4.3.9 Dealing with special characters in strings

Here are some of the characters that you might come across when working with strings in R. When you want to insert these characters in a string, you need to precede them with the escape character “\”. Here is a table adapted from the stringr cheatsheet.

String	Represents	How to search in a pattern
\.	.	\\.
\!	!	\\!
\?	?	\\?
\(	(	\\(
\)	)	\\)
\{	{	\\{
\}	}	\\}
\n	newline	\\n
\t	tab	\\t
\\	backslash \	\\\\
\’	apostrophe ’	\\’
\”	quotation mark ”	\\”
\`	backtick `	\\`

Here are just a few example to so you can see how R deals with these special characters.

string <- "Dear diary\nWhat is wrong with me\nMy code never works as I entend"

# If we just print the string, we see it exactly as written.
print(string)

[1] "Dear diary\nWhat is wrong with me\nMy code never works as I entend"

The writeLines() function can be used to print the string where escaped characters are interpreted.

writeLines(string)

Dear diary
What is wrong with me
My code never works as I entend

Let’s read a text file (.txt) in R and see what happens.

url <- "https://pmongeon.github.io/info6270/files/boring_story.txt"

# reads the file and produces a vector with one element for each line
read_lines(url)

[1] "This is a \"story\" that I wrote just for the INFO6270 course."               
[2] "It's a bit of a boring story, but it's just an example. So please forgive me."
[3] "...and they were happy ever after.\tThe end."

# reads the file and procudes a vector with a single element containing the entire content
read_file(url)

[1] "This is a \"story\" that I wrote just for the INFO6270 course.\nIt's a bit of a boring story, but it's just an example. So please forgive me.\n...and they were happy ever after.\tThe end."

# Let's read the whole file and print it with writeLines()
read_file(url) %>% 
  writeLines()

This is a "story" that I wrote just for the INFO6270 course.
It's a bit of a boring story, but it's just an example. So please forgive me.
...and they were happy ever after.  The end.

4.3.10 Summary

This chapter introduced you to the stringr package and the general principles of manipulating and matching character patterns in R. The goal was to give you enough of the basics so that you can fix small issues with strings in the data that you might encounter in this course, and in your professional or personal lives.