Before Selenium, testing the functionality of a web application was done manually, which took many hours. The testing tended to rely on different scenarios. Each scenario was considered a test case to enact the behavior of the web app before its implementation. These test cases were deployed on various browsers to affirm any issues in the source code.
It requires a dedicated team of testers to check all test cases. Accuracy and time are major constraints in web development, which has led to automated test cases that can be used in different web applications without changing the source code. Selenium was developed to automate test cases.
This first chapter of the book offers a complete overview of Selenium and its core architectural design. The subtopics explain using Selenium and compares it to other testing tools in the domain. Later in the chapter, integrating Python with Selenium is explained. Let’s start with a brief history and description of the Selenium tool and the reasons to use it.
What Is Selenium?
Selenium came into existence in 2004 at ThoughtWorks to test a web application named Time and Expenses by Jason Huggins. The tool was developed to test the front-end behavior of an application in various browsers. The tool was popular and open source. The increase in demand for automated testing led to the development of several versions of Selenium over the years, which are discussed next.
Selenium Tools and Versions
Any one (or more) of these Selenium tools can be used by an organization; the choice depends on the test environment’s requirements. The first tool developed by ThoughtWorks was Selenium Core in 2004. It allowed the tester to write their own code/script to automate front-end interactions like keyboard or mouse activities. These activities were similar to a user interacting with the application.
In web application security, there is a policy that grants a test script permission to access data in web pages from the same origin; this policy is called a same-host policy. A same-host policy only allows a test case to access pages within the same domain. For example, a test script can access multiple pages within www.apress.com, such as www.apress.com/in/python and www.apress.com/in/about, because of the same-host policy; however, this policy does not allow access to pages from different sites, such as https://google.com or https://wikipedia.org.
Due to the same-host policy, access to code elements is denied or blocked when using external scripts. To avoid this complication, Huggins and Paul Hammant developed a server component that enables you to test a web app with a test script that makes the browser believe that both are from the same source. This core Selenium was eventually known as Driven Selenium or Selenium B.
Selenium RC (Remote Control)
Selenium RC (remote control) was deployed by Dan Fabulich and Nelson Sproul in 2005 to enable a stand-alone server using an HTTP proxy. This solved the issue faced by Selenium Core with the same-host policy. Selenium RC was divided into two parts: Selenium Remote Server and Remote Client. The time it took both the server and the client to receive and send HTTP requests led to the slower execution of test cases; hence, Selenium RC became the least-used tool.
Selenium IDE
In 2006, a completely integrated development environment (IDE) was developed by Shinya Kasatani for testers. It was in the form of a plugin for Mozilla Firefox and Google Chrome web browsers. Selenium IDE used functional tests in a live environment. The feature of this IDE involved tests to record/replay and debug/edit, which was known as Selenium Recorder. The recording scripts were stored in a test script called a Selenese script in Selenium. The test scripts were written in languages like Java, Ruby, JavaScript, and PHP. The IDE also provided data retrieval options for test cases performed for web apps. Selenium IDE is currently actively maintained by Kantu and Katalon.
Selenium Grid
It was difficult to test web applications on the new technologically-enabled devices that were emerging on the market. To solve this issue, in 2008, Philippe Hanrigou at ThoughtWorks developed a grid architecture that allowed you to test apps on any number of remote devices via a browser, which became Selenium Grid. It reduced the time to test scripts on any number of remote devices because it was done in parallel. The test command tests on remote devices via a browser. There are two components necessary to execute a test script on a remote device: a hub/server and a node/remote device.
The hub/server gets requests from the web driver client that allows access and routes it to remote drivers. These drivers are registered on remote devices. A node/remote device has a local OS and browser. The web driver is the part of a browser that performs tests. When defining a script, you need to name a remote device, platform, browser, and so forth, to locate a specific node, and then test scripts are executed for that node.
Selenium WebDriver
Web Browsers and Their respective Selenium WebDriver
Web Browser | Driver Name |
---|---|
Mozilla Firefox | Firefox (i.e., Gecko) |
Google Chrome | Chrome |
Apple Safari | Safari |
Opera | Opera |
Microsoft Internet Explorer | Internet Explorer |
Microsoft Edge | Edge |
Each driver listed in the table is maintained to support automation for its respective browser. Another browser driver, HTMLUnitDriver, stimulates browsers using a headless browser (HtmlUnit).
Selenium WebDriver allows you to start a web browser directly and manages it by executing commands. To avoid security conflicts and issues, WebDriver uses native OS functionality instead of browser-based JavaScript commands. The Selenium WebDriver version of WebDriver focused on the interface. The later versions are Selenium 2.0 and Selenium 3.0.
Selenium 2.0
In 2007, Simon Stewart at ThoughtWorks created Selenium 2.0, which enables automation on almost all browsers. This version has fewer calls and allows testers/developers to create their own domain-specific language (DSL). The Watir web driver, which is implemented in Ruby, is one of the best examples of DSL.
Selenium 3.0
Developers Simon Stewart and David Burns made a draft to standardize Selenium, which was fully accepted and became a W3C standard protocol in 2019 when it became known as Selenium 3.0.
This completes the overview of Selenium and its evolution through the years; now let’s consider the Selenium architecture before diving into the test cases, which are covered in the upcoming chapters of this book.
Selenium WebDriver Architecture
Now that you know about Selenium’s various tools and versions, let’s look at a tool that helps automate test scripts for a web application. To automate a test script, there is an interaction between the tool and browser that can only be understood by its architecture.
Client Library
A client library is a package of languages supported by Selenium. The core languages supported by Selenium are Python, Java, JavaScript, Ruby, and C# (https://selenium.dev/downloads/). Languages like Perl, PHP, GO, DART, R, and Haskell are maintained and developed by a third party (https://selenium.dev/thirdparty/). Selenium does not officially support these third-party language bindings.
JSON Wire Protocol
JSON is the acronym for JavaScript Object Notation, which is a lightweight protocol that exchanges data or information from the client to the server and vice versa. JSON is in an easy-to-understand text format, which enabled ThoughtWorks developers to use the JSON wire protocol for communication between the client library and web browser drivers. The server doesn’t bother the language used at the client side; it can only read data from the protocol, which is received in JSON format. The JSON wire protocol converts any data or information to JSON format before sending it to the server. It is a REST API.
REST (representational state transfer) defines a set of guidelines to develop an API (application programming interface). One of the rules is to get a server response from the source when linked to a URL.
Web Drivers
Each browser has a web driver associated with it (refer to Table 1-1 for more information). These web drivers are responsible for executing the commands received from the client library. The execution of these commands is done in a web browser, which communicates via HTTP.
Web Browsers
The commands received from an HTTP response from web drivers are executed. Like a client library, core and third-party browsers may be used. The browsers supported by Selenium are Firefox, Internet Explorer, Safari, Opera, Chrome, and Edge. These browsers can be run on any operating system, such as Windows, macOS, or Linux. There are third-party web drivers developed for these web browsers, but they are not recommended.
Why Selenium?
Having examined the Selenium WebDriver architecture, let’s further enhance your understanding of Selenium.
Due to the availability of testing tools on the market, a question arises: Why use Selenium for testing? There are several answers to this question, but the primary reason to use Selenium is that it is open source (i.e., freely available to use). The benefits of using Selenium as a testing tool are discussed next.
Open Source
Selenium is open source and can be used at no cost. There is a large community of developers/testers who continuously maintain and support it. You can modify, integrate, or extend Selenium for any testing environment because the code is open source.
Platforms
Selenium WebDriver is cross-platform, which means that it has the flexibility to automate test cases in any operating system, such as Windows, macOS, Unix, and Linux. These test cases written in one OS can be used in any other.
Language Support
Another reason that Selenium has a large community is that it supports multiple programming languages and scripts, like Python, Java, Ruby, Perl, PHP, JavaScript, Groovy, GO, DART, and so forth. This is supported by the Selenium WebDriver language bindings/client library.
Browser
As with platform flexibility, Selenium WebDriver supports almost all browsers. Mozilla Firefox, Google Chrome, Safari, Opera, Internet Explorer, and Edge are supported; they are the most widely used around the globe.
Reuse
Once written, the test scripts can be used as needed across any browser or OS. There are no restrictions on using a test script multiple times.
Easy Implementation
The implementation of Selenium WebDriver depends on the environment and the script used by the developer/tester or organization. This variety is due to the vast number of potential OS and browser combinations. You can develop a customized web driver or framework to implement in a specific testing environment.
Flexible
Refactoring or regrouping in a test script enables you to reduce the amount of code duplication and other complications. Selenium provides developers/testers with this flexibility in a test script.
Hardware Resources
Unlike other testing tools—like UFT, CompleteTest, Katalon Studio, and so forth—Selenium requires fewer hardware resources.
Simulation
Selenium’s simulations create the real-time behavior of a mouse and keyboard. This helps to test advanced events like drag, drop, click, hold, scroll, and so forth, with a mouse and similar keypress events made on a keyboard.
The various reasons discussed so far should satisfy reasons to use Selenium as a testing tool but comparing it with the other tools available ensures that Selenium is best at automating tests. This is discussed next.
Other Testing Tools
Comparing Selenium to Other Testing Tools
Selenium | Katalon Studio | UFT (Unified Functional Testing) | TestComplete | Watir | |
---|---|---|---|---|---|
Release Year | 2004 | 2015 | 1998 | 1999 | 2008 |
Test Platform | Cross | Cross | Windows | Windows | Cross |
Test Applications | Web apps | Web/mobile apps, API/web services | Windows/web/ mobile apps, API/web services | Windows/web/ mobile apps, API/web services | Web/mobile apps |
Language Support | Python, Java, C#, Perl, JavaScript, Ruby, PHP | Java/Groovy | VBScript | Python, JavaScript, VBScript, Jscript, Delphi, C++, C# | Ruby |
Installation Process | Easy to Intermediate (depends on the Selenium tool) | Easy | Easy | Easy | Advanced |
Programming Skills | Intermediate to Advanced (for writing desired test cases) | Advanced | Advanced | Advanced | Advanced |
Cost | Free | Free | Licensed with maintenance fees | Licensed with maintenance fees | Free |
License Type | Open source (Apache2.0) | Freeware | Proprietary | Proprietary | Open source (MIT License) |
Product Support | Open source community | Community/ business support | Dedicated staff/ community | Dedicated staff/ community | Open source community |
Now that you can see that Selenium is the best-suited automated tool for testing web applications, let’s look at why Python is the best language to integrate with Selenium.
Integrating Python with Selenium
Now that Selenium WebDriver
Python was developed within the scope of the English language, so code syntaxes are easy to read.
Python scripts are not machine-level code, which makes it easy to code.
Python offers cross-platform support, which has resulted in a large community of followers.
The installation of Python on Selenium WebDriver is easier than any other language.
Python supports the development of web and mobile applications. Python developers can easily migrate to Selenium WebDriver to test their applications because Selenium is supported in Python.
Selenium provides the Python API that connects straight to the browser. Since Python is less verbose, the Selenium commands are easy to write and execute when connecting to any of the supported browsers.
The Python programming language is a script; therefore, there is no need for a compiler to convert code from one form to another.
There is immense library support for Python due to the large number of communities behind it, who maintain and update it regularly. Selenium WebDriver can be extended to build more advanced test cases by automating as per the needs of the organization or individual.
Python libraries also support different language bindings. This helps to automate test cases for applications developed in other languages.
Listings 1-1 through 1-5 present a simple program for the languages that are supported by Selenium WebDriver. The program opens a browser, visits a specified URL, and searches a query in it. The program first imports the necessary Selenium WebDriver libraries. WebDriver then opens the Mozilla Firefox browser and specifies www.apress.com as the URL to visit. Next, it locates the search bar element in the Apress site. After locating the element, this book’s title, Python Testing with Selenium, is entered as a query in the search bar and submitted. The submitted query provides a new web page with the books associated with the query.
Python Code
Java Code
C# Code
Ruby Code
PHP Code
Summary
This chapter overviewed Selenium WebDriver, including an introduction to its various versions. Emphasis was given to Selenium’s architectural design, which provides the complete interaction process necessary to automate the test case. The importance of Selenium was discussed, including its multiple benefits and its distinction over other major testing tools. At the end of the chapter, the significance of Python integration with Selenium was shown in simple test case scenarios using Python and other languages. We further study how integration is done in different environments. Setup and configuration with Python are illustrated in the next chapter.