Before we move on and explore pyquery and its features, let's start by installing it by using pip:
C:> pip install pyquery
The following libraries are installed on a successful installation of pyquery using pip:
- cssselect-1.0.3
- lxml-4.3.1
- pyquery-1.4.0
Once the installation is completed and successful, we can use pyquery, as shown in the following code, to confirm the setup. We can explore the properties it contains by using the dir() function:
>>> from pyquery import PyQuery as pq
>>> print(dir(pq))
['Fn', '__add__', '__call__', '__class__', '__contains__', '__delattr__', '__delitem__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '_filter_only', '_get_root', '_next_all', '_prev_all', '_translator_class', '_traverse','addClass', 'add_class', 'after', 'append', 'appendTo', 'append_to','attr','base_url','before','children', 'clear', 'clone', 'closest', 'contents', 'copy', 'count', 'css','each','empty', 'encoding','end','eq', 'extend', 'filter', 'find','fn','hasClass','has_class','height','hide', 'html', 'index','insert','insertAfter', 'insertBefore', 'insert_after','insert_before', 'is_', 'items', 'length','make_links_absolute',
'map','next','nextAll','next_all','not_','outerHtml','outer_html','parent','parents', 'pop', 'prepend', 'prependTo', 'prepend_to','prev', 'prevAll', 'prev_all', 'remove', 'removeAttr', 'removeClass', 'remove_attr', 'remove_class','remove_namespaces', 'replaceAll', 'replaceWith', 'replace_all', 'replace_with', 'reverse', 'root','show', siblings','size','sort','text', 'toggleClass', 'toggle_class', 'val', 'width', 'wrap', 'wrapAll','wrap_all','xhtml_to_html']
Now we will explore certain features from pyquery that are relevant to scraping concepts. For this purpose, we will be using a page source available from https://www.python.org that has been saved locally as test.html to provide real-world usability:
Obtaining the page source or HTML code only is not enough, though, as we need to load this content into the library to gain more tools to explore with. We'll be doing this in the upcoming section.