We can run a series of numbers through a filter to determine whether each number is prime or not. We can use this script:
import pyspark
if not 'sc' in globals():
sc = pyspark.SparkContext()
def is_it_prime(number):
#make sure n is a positive integer
number = abs(number)
#simple tests
if number < 2:
return False
#2 is special case
if number == 2:
return True
#all other even numbers are not prime
if not number & 1:
return False
#divisible into it's square root
for x in range(3, int(number**0.5)+1, 2):
if number % x == 0:
return False
#must be a prime
return True
# pick a good range
numbers = sc.parallelize(range(100000))
# see how many primes are in that range
print(numbers.filter(is_it_prime).count())
The script generates numbers up to 100000.
We then loop over each of the numbers and pass it to our filter. If the filter returns True, we get a record. Then, we just count how many results we found.
Running this in Jupyter, we see the following:
This was very fast. I was waiting and didn't notice that it went so quickly.