Some other metrics

And, of course, we can use the standard data analysis tools as well after quantifying our package descriptions a bit. Let's see, for example, the length of the documents in the corpus:

> vnchar <- sapply(v, function(x) nchar(x$content))
> summary(vnchar)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   2.00   27.00   37.00   39.85   50.00  168.00

So, the average package description consists of around 40 characters, while there is a package with only two characters in the description. Well, two characters after removing numbers, punctuations, and the common words. To see which package has this very short description, we might simply call the which.min function:

> (vm <- which.min(vnchar))
[1] 221

And this is what's strange about it:

> v[[vm]]
<<PlainTextDocument (metadata: 7)>>
NA
> res[vm, ]
    V1   V2
221    <NA>

So, this is not a real package after all, but rather an empty row in the original table. Let's visually inspect the overall number of characters in the package descriptions:

> hist(vnchar, main = 'Length of R package descriptions',
+     xlab = 'Number of characters')
Some other metrics

The histogram suggests that most packages have a rather short description with no more than one sentence, based on the fact that an average English sentence includes around 15-20 words with 75-100 characters.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.139.83.151