Iterating

In this section, we will be demonstrating the iterating (perform repeatedly) facility that's available with pyquery. It's effective and easy to process in many situations.

In the following code, we are searching for the name and property attributes that are found in the <meta> tags that contain the word Python.org. We are also using Python's List Comprehension technique to demonstrate the one-line coding feature:

#Find <meta> with attribute 'content' containing '..Python.org..' 
#and list the attribute 'name' that satisfies the find()

>>> meta=page.find('meta[content*="Python.org"]')
>>> [item.attr('name') for item in meta.items() if item.attr('name') is not None]
['application-name', 'apple-mobile-web-app-title']

#Continuing from code above list value for attribute 'property'

>>> [item.attr('property') for item in meta.items() if item.attr('property') is not None]
['og:site_name', 'og:title']

As we can see in the preceding code, we are using the items() function in a loop with the element meta to iterate for the provided option. An expression resulting in iterable objects can be explored using items(). Results that return None are excluded from the list:

>>> social = page.find('a:contains("Socialize") + ul.subnav li a') 
>>> [item.text() for item in social.items() if item.text() is not None]
['Google+', 'Facebook', 'Twitter', 'Chat on IRC']

>>> [item.attr('href') for item in social.items() if item.attr('href') is not None]
['https://plus.google.com/+Python', 'https://www.facebook.com/pythonlang?fref=ts', 'https://twitter.com/ThePSF', '/community/irc/']

>>> webdevs = page.find('div.applications-widget:first ul.menu li:contains("Web Development") a')
>>> [item.text() for item in webdevs.items() if item.text() is not None]
['Django', 'Pyramid', 'Bottle', 'Tornado', 'Flask', 'web2py']

In the preceding code, the pyquery object collects the names and links that are available from the social and web development section. These can be found under Use Python for... in the following screenshot. The object is iterated using the Python list comprehension technique:

Upcoming events to be extracted using pyquery

In the following code, we will be exploring a few more details that were retrieved from the upcomingevents iteration:

>>> eventsList = []
>>> upcomingevents = page.find('div.event-widget ul.menu li')
>>> for event in upcomingevents.items():
... time = event.find('time').text()
... url = event.find('a[href*="events/python"]').attr('href')
... title = event.find('a[href*="events/python"]').text()
... eventsList.append([time,title,url])
...
>>> eventsList

eventsList contains extracted details from Upcoming Events, as shown in the preceding screenshot. The output from eventsList is provided here:

[['2019-02-19', 'PyCon Namibia 2019', '/events/python-events/790/'], ['2019-02-23', 'PyCascades 2019', '/events/python-events/757/'],
['2019-02-23', 'PyCon APAC 2019', '/events/python-events/807/'], ['2019-02-23', 'Berlin Python Pizza', '/events/python-events/798/'],
['2019-03-15', 'Django Girls Rivers 2019 Workshop', '/events/python-user-group/816/']]
DevTools can be used to identify a CSS selector for the particular section and can be further processed with the looping facility. For more information regarding the CSS Selector, please refer to Chapter 3, Using LXML, XPath, and CSS Selectors, and the XPath and CSS selectors using DevTools section.

The following code illustrates a few more examples of the pyquery iterating process via the use of find() and items():

>>> buttons = page.find('a.button')
>>> for item in buttons.items():
... print(item.text(),' :: ',item.attr('href'))
...

>_ Launch Interactive Shell :: /shell/
Become a Member :: /users/membership/
Donate to the PSF :: /psf/donations/

>>> buttons = page.find('a.button:odd')
>>> for item in buttons.items():
... print(item.text(),' :: ',item.attr('href'))
...

Become a Member :: /users/membership/

>>> buttons = page.find('a.button:even')
>>> for item in buttons.items():
... print(item.text(),' :: ',item.attr('href'))
...

>_ Launch Interactive Shell :: /shell/
Donate to the PSF :: /psf/donations/

For more information on features, attributes, and methods from pyquery, please refer to the https://pythonhosted.org/pyquery/index.html.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.66.67