CSS selectors

In Chapter 1, Web Scraping Fundamentals, under the Understanding web development and technologies section, we learned about CSS and its use to style HTML elements plus we learned about using global attributes. CSS is normally used to style HTML and there are various ways to apply CSS to the HTML.

CSS selectors (also referred to as CSS query or CSS selector query) are defined patterns used by CSS to select HTML elements, using the element name or global attributes (ID, and Class). CSS selectors, as the name suggests, select or provide the option to select HTML elements in various ways.

In the following example code, we can visualize a few elements found in <body>:

  • <h1> is an element and a selector.
  • The <p> element or selector has the class attribute with the header style type. When it comes to selecting, <p> we can use either the element name, the attribute name, or just the type name.
  • Multiple <a> are found inside <div>, but they differ with their class attribute, id, and value for the href property:
<html>
<head>
<title>CSS Selectors: Testing</title>
<style>
h1{color:black;}
.header,.links{color: blue;}
.plan{color: black;}
#link{color: blue;}
</style>
</head>
<body>
<h1>Main Title</h1>
<p class=”header”>Page Header</p>
<div class="links">
<a class="plan" href="*.pdf">Document Places</a>
<a id="link" href="mailto:[email protected]">Email Link1!</a>
<a href="mailto:[email protected]">Email Link2!</a>
</div>
</body>
</html>

The distinguishable patterns we have identified in the preceding code can be used to select those particular elements individually or in groups. Numbers of DOM parsers are available online, which provide a CSS query-related facility. One of them, as shown in the following screenshot, is https://try.jsoup.org/:

Evaluating CSS query from https://try.jsoup.org/
The DOM parser converts provided XML or HTML into a DOM object or tree type of structure, which facilitates accessing and manipulating element or tree nodes. For more detail information on the DOM, please visit https://dom.spec.whatwg.org/.

In a CSS query, various symbols, as listed in the following code text, represent certain characteristics and can be used inside a CSS query:

  • The global id attribute and class are represented by # and ., respectively, as seen in this query:
    • a#link: <a id="link" href="mailto:[email protected]">Email Link1!</a>
    • a.plan: <a class="plan" href="*.pdf">Document Places</a>
  • Combinators (showing the relationship between elements) are also used, such as +, >, ~, and the space character, as seen in the query here:
    • h1 + p: <p class=”header”>Page Header</p>
    • div.links a.plan: <a class="plan" href="*.pdf">Document Places</a>
  • Operators, such as ^, *, $ are used for positioning and selecting, as seen in this query:
    • a[href$="pdf"]: <a class="plan" href="*.pdf">Document Places</a>
    • a[href^="mailto"]: <a id="link" href="mailto:[email protected]">Email Link1!</a><a href="mailto:[email protected]">Email Link2!</a>

These symbols are used and explained side-by-side, referring to the preceding HTML code with various types of selectors, in the following sections.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.14.248.119