Spark primes

We can run a series of numbers through a filter to determine whether each number is prime or not. We can use this script:

import pyspark
if not 'sc' in globals():
sc = pyspark.SparkContext()

def is_it_prime(number):

#make sure n is a positive integer
number = abs(number)

#simple tests
if number < 2:
return False

#2 is special case
if number == 2:
return True

#all other even numbers are not prime
if not number & 1:
return False

#divisible into it's square root
for x in range(3, int(number**0.5)+1, 2):
if number % x == 0:
return False

#must be a prime
return True

# pick a good range
numbers = sc.parallelize(range(100000))

# see how many primes are in that range
print(numbers.filter(is_it_prime).count())

The script generates numbers up to 100000.

We then loop over each of the numbers and pass it to our filter. If the filter returns True, we get a record. Then, we just count how many results we found.

Running this in Jupyter, we see the following:

This was very fast. I was waiting and didn't notice that it went so quickly.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.224.95.38