The Impact of Tool Choice

The Top Tools

The top two tools in the sample were Excel and SQL, both with use by 69% of the sample, followed by R (57%) and Python (54%). Compared to last year, Excel is up (from 59%), as is R (from 52%), while SQL and Python are only slightly higher than last year.

Over 90% of the sample reported spending at least some time coding, and 80% used at least one of Python, R, and Java, although only 8% used all three. The most commonly used tools (except for operating systems) were included in the model training data as individual coefficients; of these, Python, JavaScript, and Excel had significant coefficients: +4.6, –2.2 and –7.4, respectively. Less commonly used tools were first grouped together into clusters and aggregate features were included that represent counts of tools used from each cluster. For five clusters that were found to have a significant correlation with salary, coefficients are added on a per-tool basis.1

The cluster with the largest coefficient was centered on Spark and Unix, contributing +3.9 per tool. Spark usage was 20%, up from last year’s a modest 3%, and it continues to be used by the more well paid individuals in the sample.

In contrast to the largely open source Spark/Unix cluster, the second highest cluster coefficient (+2.4) was assigned to a cluster dominated by proprietary software: Tableau, Teradata, Netezza, Microstrategy, Aster Data, and Jaspersoft. In last year’s report, Teradata also featured as a tool with a large, positive coefficient. The other three clusters with significant coefficients mostly consisted of open source data tools.

Which Tools to Add to Your Stack

While the model we’ve explained is a good way to get an estimate for how much someone earns given a certain tool stack, it doesn’t necessarily work as a good guide for which tool to learn next. The real question is whether a tool is useful for getting done what you need to get done. If you never have to analyze more data than can fit into memory on your local machine, you might not get any benefit—much less a salary boost—by using a tool that leverages distributed systems, for example.

Salary and Sequences of Tools

In the following sequences of tools, the next tool in the sequence was frequently used by respondents who used all earlier tools, and these sequences had the best salary differentials at each step.

If you know the first tool in a sequence, you might consider learning the second, and so on.

1 Tools are added up to a maximum number. This is because few respondents had more than that number of tools from the cluster, and so if someone uses more, there is no evidence to support continued addition of coefficients.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.221.165.246