Python String

In this tutorial, you will learn about what is the sequential data type and its classification, and also its various operations in the python programming language.

What is Sequential Datatype?

In the Python programming language, Sequential datatypes are one of the fundamental built-in datatypes apart from numeric, mapping, instance, and exception.

A sequence can be defined as a collection of objects arranged in a particular order such that each of them follows the other.

Classification of Sequence Datatypes

Python has six sequential data types – String, List, Tuple, Range, Bytes, Bytearrays. Among these, the most important sequential datatypes are String, List, and Tuple.

Command Prompt

String Datatype in Python

In programming, a sequence of characters is termed as a string.

Characters can be letters, digits, symbols, space, or punctuation marks. Anything (visible or invisible) that can be typed on a computer is considered as a character. For instance, Python is a string with six characters where all characters are alphabetic letters.

A computer being a machine always understands binary language, 0 or 1, unlike human beings. To be more specific, characters that are readable to the user have to be converted to machine language. In the computer, each and every character that appears on the screen is implicitly converted into a binary combination of 0’s and 1’s. This type of conversion is termed as encoding and the conversion of binary to characters is called decoding. Two commonly accepted encoding standards are ASCII and Unicode.

In the Python programming language, a string is an ordered sequence of Unicode characters. One important feature of the string is its immutability which means it is impossible to alter the state of a string after being created.

How to define a string in python

Python strings are defined by bounding texts or characters either in double (“……”) quotes or single (‘…….’) quotes.

Example: Defining string using single and double quotes

Str1 = 'Hello'
Str2 = "Welcome"
 
print(Str1)
print(Str2) 

Output:

Hello
Welcome

Triple Quotes in Python String

In python string, triple quotes are used to represent multiple line strings. We can use either three successive single quotes (‘‘‘……….’’’) or three successive double quotes (“””……”””) to traverse multiple lines.

Example: defining a string in multiple lines

Str3 =  '''...
Python programming Language
multiline in three single quotes
... '''

Str4 =""".....
Multiline string example 
in three double quotes
....."""
print(Str3)
print(Str4) 

Output:

...
Python programming Language
multiline in three single quotes
...
.....
Multiline string example
in three double quotes
.....

Indexing and Slicing to access Strings

Python, similar to other programming languages, also considers string as an array of Unicode characters. We can access characters either individually or collectively using indexing or slicing.

What is Indexing in python string?

Indexing is the process of numbering the position of a character in a sequence to facilitate an easy lookup of the character. The index always must be an integer.

As the string is a sequence of Unicode characters each character position is indexed with a corresponding number as done in the array. Two types of indexing in python are:

  • Positive Indexing – indexing starts with 0 and traverses in the forward direction from head to tail.
  • Negative Indexing – indexing starts with -1 and traverses in the backward direction from tail to head.

For instance,

PYTHON is a string with an array of characters P, Y, T, H, O, and N. Length of the string is 6. Positive and Negative indexing for the string PYTHON is visualized in the below table.

Command Prompt

Getting characters through positive indexing

Referring to the index number we can easily access the characters individually by using the index operators [ ]. The below example shows the extraction of a single character through positive indexing.

Example: Accessing character through positive indexing

S = 'PYTHON'
print('S[0] = ',S[0])
print('S[4] = ',S[4]) 

Output:

S[0] =  P
S[4] =  O

Getting characters through negative indexing

Suppose we have a long string and want to locate a character at the end. Python supports backward counting from the tail to the head. Negative indexing always starts with -1.

Example: Accessing character through negative indexing

S = 'PYTHON'
print('S[-1] = ',S[-1])
print('S[-5] = ',S[-5]) 

Output:

S[-1] =  N
S[-5] =  Y

Errors in String while indexing

Two common types of error found in python while using indexing are

  • Type Error: A type error occurs when indexing is done with numbers other than an integer. To be specific, python raises TypeError when indexed with the float or complex or other types.
  • IndexError: IndexError happens while attempting to access a value beyond its range.

Example: Errors while indexing

S = 'PYTHON'
print('s[4.5] = ',s[4.5])   #Exibits typeerror as index is a float value
print('s[41] = ',s[41])  #Exibits Indexerror is out of length 

Output:

   print('s[4.5] = ',s[4.5])
TypeError: string indices must be integers
   
 print('s[41] = ',s[41])
IndexError: string index out of range

What is slicing in python string?

We have so far discussed how to access a single character from a string using an index operator []. We can also extract characters collectively from a string by using a range slice operator [:].

Slicing as its name indicates, slices the sequences into a section of sequences. In Python string, index or slice operator [ ] is used to access substrings of length one while range slice operator [:] is used to access chunks of characters or a substring of arbitrary length.

The syntax for Range Slice can be represented as

S[m : n] 

where,

             S: String

             m: starting index

              n: ending index

S[m:n] returns the substring from the index m to n, but excluding the index n.

For better understanding find below a visualization of Range slicing. Each character is placed between the indices. For example, character P is placed between 0 and 1, ‘PY’ is placed between 0 and 2.

Command Prompt

Slicing through positive indexing

The below table gives you the idea of slicing using positive indexing in python more clearly

slicing expression length of substring output remarks
S[0:5] 5 python returns the character from index 0 to 4 ; position 5 not included
S[0:6] 6 python returns the characters from index 0 to 5 and 6(excluded)
S[7:11] 4 worl returns
S[6:12] 6 w,o,r,l,d omitted index 12
S[: 8] 8 python w by default start from the head and omit index 8
S[8:] 4 orld by default counts to tail

Example: slicing substring from string ‘python world’ using positive indexing

S ="python world"
print("S[0:5] = ",S[0:5])
print("S[0:6] = ",S[0:6])
print("S[7:11] = ",S[7:11])
print("S[6:12] = ",S[6:12])  
print("S[:8] = ",S[:8])
print("S[8:] = ",S[8:]) 

Output:

S[0:5] =  pytho
S[0:6] =  python
S[7:11] =  worl
S[6:12] =   world
S[:8] =  python w
S[8:] =  orld

Slicing using Negative indexing

The below table gives you the idea of slicing using negative indexing in python more clearly

slicing expression length of substring output remarks
S[-12:-7] 5 pytho returns the characters from position -8 to -12 omitting position -7  
S[-7:-3] 4 n wo returns the characters from position -4 to -7 while omitted position -3
S[-12:] 12 python world by default counts from the tail to end
S[:-7] 5 pytho    by default, counts to head from position -8

Example: Slicing substring from string ‘python world’ using negative indexing

S ="python world"
print("S[-12:-7] = ",S[-12:-7])
print("S[-7:-3] = ",S[-7:-3])
print("S[-12:] = ",S[-12:])
print("S[:-7] = ",S[:-7]) 

Output:

S[-12:-7] =  pytho
S[-7:-3] =   wo
S[-12:] =  python world
S[:-7] =  pytho

How to modify a string?

One of the special features of the string is its immutability. We cannot modify a string once it is assigned to a variable. However, we can update a string by reassigning variable with another string.

Example: Reassigning string variable

V = 'Python World'
print('Initially Variable V is assigned to :',V)
V = ' Python programming'
print("Variable V is reassigned to :",V) 

Output:

Initially Variable V is assigned to : Python World
Variable V is reassigned to :  Python programming

Similarly, we can delete an entire string by using the keyword “del” but removing characters from a string is not a valid action in the python programming language. The syntax for deleting a string is as follows:

del variablename

Example: Deleting a string

V = 'Python World'
del V
print (V) 

Output: Error

print('V =',V)
NameError: name 'V' is not defined

Python String Function

Python has several built-in functions to perform specific tasks with string. The most commonly used string function is len(). Apart from len(),enumerate() the function is also widely used which we will discuss in later tutorials.

len() the function gives the length of the string by counting the number of characters present in the string.

Example: String function

#String Function
str ='Python World'
print('Length of the string is',len(str)) 

Output:

Length of the string is 12

Python String Operators and Operations

Python allows the string to perform a variety of operations. As each operation is unique they have unique operators designed. Among the most important and common operators in the string are explained in this section.

Checking Substring – Membership Operator

The membership operator is used to validate the existence of substring in a string. The result will be either True or False. If substring exists it returns the truth value, True otherwise False. There are two types of membership operator in python. They are

  •  in: returns true if a substring exists in the string
  • not in: returns true if the substring does not exist in the string

Example: Membership validation

#Membership operator
str = 'Python Member'

A ='M' in str
B='Me' not in str

print(A)
print(B) 

Output:

True
False

Concatenation and Repetition of Strings

One of the fundamental operations of the string is concatenation. Concatenation refers to the gluing of two or more strings to form a new string. The concatenation operator in python is “+” plus symbol. Please note that in case of the numeric plus sign is used for addition while in the string it acts as a string joiner.

Example: String concatenation using + operator

#String Concatenation using +
s1 = 'Python'
s2 = 'World'
print('String after concatenation :',s1+s2) 

Output:

String after concatenation : PythonWorld

Similarly, using a * symbol we can repeatedly join a string multiple times.

Example: String repetition using * operator

#String Concatenation using *
s1 = 'Python'
print('s1*3 =',s1*3) 

Output:

s1*3 = PythonPythonPython

Note: As strings are immutable, a new string formed after concatenation needs to be assigned to a new variable in order to store it.

The next fact to be noted is that implicit string conversion is not possible in the python programming language. Hence concatenating a string with a non-string type like the number, Boolean, etc will result a TypeError.

Example: String concatenation: Type Error

s1 = 'Python'
print( s1+3) 

Output:

print( s1+3)
TypeError: can only concatenate str (not "int") to str

Note: Python can concatenate string to a string only

String Formatting in Python

What is an Escape Character in String

Let us recall that in python a string is delimited either in single quotes or double-quotes. What happens when we try to print plain text like   It’s a “python program” which already has double quotes? A SyntaxError stating invalid syntax happens when the text gets interpreted. 

One way to solve this problem is by using triple quotes – either 3 consecutive single quotes or 3 consecutive double-quotes. The other option is to use escape characters.

An escape character, as its name indicates, escapes a special character like single or double quotes in a string. The backslash (\ ) is considered as the escape character in the python string. In other words, an escape character allows you to transform a special character into an ordinary character.

For instance, 
It\’s a \ “python program\” is the same as It’s a “python program” in python. In this example, each special character is prefixed with a backslash to circumvent the SyntaxError thereby allowing to print the special characters in a text.
 

Example : String formatting : Escape Character


#String formatting
print('''it's a "python program"!!!''')       #Triple Quotes
print('it\'s a \"python program\"!!!')     #Escape Character
 

Output:


#String formatting
print('''it's a "python program"!!!''')       #Triple Quotes
print('it\'s a \"python program\"!!!')     #Escape Character

Backslash is also used to denote some white space characters like tab, newline, space ,carriage return etc.

Example : String formatting : Escape Character


print('Straw\tBerry')
print("Mul\nBerry")
 

Output:


Straw   Berry
Mul
Berry

Listed below additional escape characters in python .

Escape Formats Specification
\’ Single Quotes
\” Double Quotes
\n Newline or Linefeed
\t Horizontal Tab
\v Vertical Tab
\r Carriage Return
\b Backspace
\a Bell
\f Formfeed
\\ Backslash
\ooo ASCII Octal  value ooo
\xhh ASCII Hexdecimal value hh

What is a raw string in python

Yet another unique feature of python is the ability to treat an escape character as a normal character through the representation of Raw String. Raw string is simply the normal string literal starting with title ‘r’ or ‘R’.  Unlike escape sequence, Backslash has no special meaning when comes to a raw string. 

Example : Raw String


#Raw StringExample
print("Hi\tWelcome To \n PYTHON \x48 WORLD! ")
print(R"Hi\tWelcome To \n PYTHON \xWORLD! ")

Output:


Hi      Welcome To
PYTHON H WORLD!
Hi\tWelcome To \n PYTHON \xWORLD!

To cognize clearly about raw string let scrutinize the above example. In the above example the first string consists of 3 escape characters -\t , \n and \x48  which denotes a tab, new line and a hexdecimal representation. You can see the result accordingly. The second string is marked as a Raw string and also contains 3 escape characters--\t , \n and \x. Here \x   does not have a specific representation or meaning . Even then the program successfully prints the string without raising any error.  This is because the raw string ignores all the escape characters in the string literals. But the case is different if the string is not marked as a rawstring. it will raise an error as shown below.


print("Hi\tWelcome To \n PYTHON \x WORLD! ")
          ^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 25-26: truncated \xXX escape
 

String Methods in Python

To manipulate string efficiently python has a wide set of built-in methods which are tabulated below

Sr.No. Methods Description
1 capitalize() Capitalizes first letter of string
2 center(width,fillchar) Returns a space-padded string with the original string centered.
3 count(str, beg, end) Counts the occurrence of str in string or in a substring of string if starting index beg and ending index end are given.
4 decode(encoding,errors) Decodes the string using the codec registered for encoding.
5 encode(encoding,errors)   Returns encoded string version of the string; ValueError raised when error encounters.
6 endswith(suffix, beg, end) Determines and returns True  if string or a substring of string (if starting index beg and ending index end are given) ends with suffix; otherwise False
7 expandtabs(tabsize) Expands tabs in string to multiple spaces; by default tabsize expands to 8 spaces.
8 find(str, beg, end) Returns the index of str if it is present in the string or its substring and -1 otherwise.
9 index(str, beg, end) Return the index same as find(), but raises an exception if str not found.
10 isalnum() Returns true if string has at least 1 alphanumeric character and false otherwise.
11 isalpha() Returns true if string has at least 1 alphabetic character  and false otherwise.
12 isdigit() Returns true if and only if string contains digits and false otherwise.
13 islower()   Returns true if string has atleast one lowercase letter and false otherwise.
14 isnumeric()   Returns true if a unicode string has only numeric characters and false otherwise.
15 isspace()   Returns true if string contains whitespace characters and false otherwise.
16 istitle()   Returns true if string is properly "titlecased" and false otherwise.
17 isupper()   Returns true if string has at least one uppercase character and false otherwise.
18 join() joins multiple strings.
19 len(string) Returns the length or number of characters in the string
20 ljust(width[, fillchar]) Returns a space-padded string with the original string left-justified.
21 lower() Converts all uppercase letters in string to lowercase.
22 lstrip() Removes all leading whitespace in the string.
23 maketrans() Returns a translation table to be used in translate function.
24 max(str) Returns the max alphabetical character from the string str.
25 min(str) Returns the min alphabetical character from the string str.
26 replace(old, new [, max])   Replaces all occurrences of old in string with new or at most max occurrences if max given.
27 rfind(str, beg,end) Same as find(), but search the string in reverse order.
28 rindex( str, beg, end) Same as index(), but search the string in reverse order.
29 rjust(width,[, fillchar])   Returns a space-padded string with the original string right-justified.
30 rstrip() Removes all trailing whitespace of string.
31 split(str num) Splits string to the parameters passed and returns list as much as it can.
32 splitlines( num=string.count('\n')) Splits string at all (or num) NEWLINEs and returns a list of each line with NEWLINEs removed.
33 startswith(str, beg,end) Returns True if a string or sub string starts with str(if beg and end are provided).
34 strip([chars]) Performs both lstrip() and rstrip() on string.
35 swapcase()   Inverts case for all letters in string.
36 title() Returns "titlecased" format of string, means, all words begin with uppercase and the remaining are lowercase.
37 translate(table, deletechars="")   Translates string according to translation table str(256 chars), removing those in the del string.
38 upper()   Converts all lowercase letters in string to uppercase.
39 zfill (width)   Returns original string leftpadded with zeros to a total of width characters; intended for numbers, zfill() retains any sign given (less one zero).
40 isdecimal()   Returns true if and only if a unicode string contains decimal characters and false otherwise.

Examples of most frequently used methods such as lower(), upper(), join(), Split(), format(),replace etc. are given.

lower() – returns the lower case string


S = 'Python world'
print(S.lower())
 

python world

upper() – returns the upper case string


S = 'Python world'
print(S.upper())
 

PYTHON WORLD

replace() – used to replace existing string with other


S = 'Python world'
print(S.replace('world','program'))
 

Python program