© Malathi Mahadevan 2018
Malathi MahadevanData Professionals at Workhttps://doi.org/10.1007/978-1-4842-3967-4_13

13. Steph Locke

Data Science Consultant, Locke Data
Malathi Mahadevan1 
(1)
Raleigh, NC, USA
 

../images/463664_1_En_13_Chapter/463664_1_En_13_Figa_HTML.jpg Steph Locke heads up a data science consultancy , Locke Data ( https://itsalocke.com ), and is a Microsoft MVP for her efforts in the data science space. She leads technical communities at local and global levels to help grow people’s skills. A seasoned presenter, Steph delivers talks and keynotes at conferences around the world.

Steph transitioned from a business-heavy role early in her career in business intelligence and then data science . With over a decade of leveraging data to deliver benefit for companies, including finance organizations and startups, Steph has a proven track record of success. An experienced manager and member of leadership teams, she understands how to align data science with organizational goals and integrate it into company processes.

Now Steph runs her growing data science consultancy out of the United Kingdom with remote employees and freelancers distributed across Europe. As well as building her own startup, she blogs, presents, organizes, and codes around data science and DataOps. She’s on Twitter at @thestephlocke.

Mala Mahadevan: Describe your journey into the data profession.

Steph Locke: I actually started working at a burger van at a very young age. My job was initially to bake potatoes, flip burgers, and make hot dogs and stuff. And by the time I was sixteen or seventeen, I was managing it. I was looking after eight other young people in two-meter squared of space on most days, so I learned management, multitasking, and workflow optimization at a young age. That probably impacted my career a huge amount. But, the smell of fried onions was permanently ingrained in my skin and my hair. Made me feel hungry wherever I went, but the job didn’t really challenge me intellectually.

I stopped working at the burger van, picked up various temp jobs, and before I finished my third year of university, I got a full-time job at Confused.com as a product analyst. I finished my degree at night. So, it was philosophy, lots of reading books and writing essays, which is probably one of the easiest degrees to do from home and without ever having to go to a lecture.

As a product analyst, my job was partly to provide analysis. When I started there was no data. It was from a third-party system. We just had a web browser. And my boss, before I got there, used to send out a daily report on how many people had used the service. He’d sit there with a post-it and do tally tables.

My job was to fix that. I also did a load of other really useful things. Like, work with suppliers, do contract negotiations. I did a load of digital marketing . I really got my hands stuck in a whole load of the business side, which was really useful. It gave me a strong appreciation for what makes money.

I was doing all of that business side of things whilst also jerry-rigging up a pretty badass Excel solution—consuming the website things and building whole amazing racks of reports off the back of it. Building cool forecasting systems to help us work out where we should spend our money. What sort of rate of returns we might get, things like that. But all of that was done in Excel off of some website pages. That wasn’t a very good long-term solution, and I could only do stuff with our one product and I couldn’t compare across multiple products. So, I convinced somebody to teach me some SQL.

I was a menace to the data warehouse team. I learned to write SQL, accidentally cross-joining and things like that with our ten million customers. So, I was a hazard for a little while, then I got good enough to know when to kill queries when I had written them badly. By that point, the data warehouse people and the BI team realized that it might be worth bringing me in from the cold and actually keeping an eye on me.

I moved into the team. My job was to take all the stuff that I had built in Excel and build a real solution. So, I got to do the database modeling for the data warehouse. Then the ETL in SSIS to get the data in, and then the reporting on top of it to present it back. And then wonderful SSRS reports. Even ones sent by email that looked good on Blackberries and stuff. So, a lot of end-to-end solutions learning experience. Almost everything I have done since has been finding that next thing that adds value and I’m not bored by doing. It’s been a fun journey.

Mala: Describe a few things you knew when you started that you know now.

Steph: The biggest thing is do what’s important not what’s urgent. It’s so much better to do the important thing kind of well than do the urgent thing brilliantly. I still would love to be able to focus on the important things better.

Mala: What was analytics and visualization like five years ago? And why is it so hard now?

Steph: So, I actually wrote about this topic in a book called Tribal SQL [Redgate Books] in 2013. So, it is the perfect proof of what I thought things were going to be like. In the intervening years and the things that I said were important at the time, included SharePoint, which is no longer important. But things like understanding how to model your data and database report design. You should have a good grasp of ETL and SQL, and even a bit of HTML and CSS for styling and stuff. And I think almost everything except SharePoint is still pretty good on that list. And also, I had a section where I looked to the future. And the future was SSRS sucks.

Mala: I love that!

Steph: Excel is great but risky. I call out Power Pivot and Power View, which are being rolled into Power BI. Important things that we should look at. Office 365 is going to be where it is at. And then, outside of the stack, I talk about the growth of things. Those two predictions have borne out pretty well. So yeah, I think we’ve come a long way from the kind of crappy SSRS reports that everybody hated. And we’re now in the position where obviously, we still need some SSRS reports, but we’ve got really nice interactive visualizations and solutions that people can get so much more out of without us having to do things.

And I think a lot of that change has been to do with the interactivity things. Most of that has been down to JavaScript . You know, like D3, Vega-Lite, high charts, and lucid charts. There’s just so many data visualization tools out there, and many have been integrated into great tools like Tableau and Power BI that it has that facilitating of people to answer their own questions without having to learn to use an Excel spreadsheet.

Phenomenal. And I think that’s why it is so big now. It’s because we’ve got the ability for everybody from the developer to a business person to be able to make things that help other people so much more quickly than we ever had before.

Mala: Describe your experience with cloud adoption .

Steph: The UK didn’t have cloud data centers for the longest time. A lot of the blockers to cloud adoption were compliance. Storing data not in the UK, storing it on somebody else’s machines, did Microsoft have access to it, if the government’s watching us, and you know, that kind of paranoid but legitimate worry sort of things.

Microsoft and the other cloud providers have done a huge amount, in terms of compliance and data protection, and now I’d almost always say that data is safer on a cloud server than somebody’s data center.

So, we’re seeing a lot more people adopt putting their data in the cloud because they’ve had a chance to evaluate data protection and compliance issues, and work out where the real risks are.

The other area that I’ve seen a lot of adoption is kind of that decision loop that I was talking about for a business user. You have almost a technical loop as well. How do we go from needing some functionality, needing to make, add some business value, to getting that business value live? And previously, a big part of that would have been requesting a virtual machine admin, then going through configuration, and then doing the deployment. Everything just took longer. Now people are able to prototype and do things in production much more quickly, because the tooling around cloud computing can greatly reduce that technical loop.

Mala: So, what are the challenges in analytics today? Both from a technology point of view and a people point of view.

Steph: I think we still don’t have compliance solved. So, how do we track data through our systems? How do we cleanse data that we don’t need? How do we action request the information from a GDPR? How do we protect it for HIPAA and PCDISS?

There are so many breaches that happen on a regular basis, and it’s because we still haven’t got to the point where we’re doing these things well. And we’re doing it well across the board. So, that’s a big area.

Another big challenge in the data science world now is we’re predicting things and saying whether somebody picks a loan, or how much somebody’s healthcare bill’s going to cost, or whether they’re likely to commit a crime. We’re making huge assertions about what people will do and then change their ability to access everything from healthcare to the criminal justice system to finance and shopping. We have to be ethical about this. And not just, “the business told me to do it, so I did it.” Having orders is not sufficiently a good enough excuse.

And then the final thing is the technical loop and the decision loop. How do we keep shrinking the length of those loops?

We need more and more people to adopt automation and this sort of value-driven attitude because there’s still a lot of people whose job is to click Next in GUI. There are still people manually taking backups out there.

There’s still people manually making SSRS reports and spending all their time making things align in Excel. There are so many places where we’re not using automation.

And, that’s the other big challenge to us today. How do we up-skill people, and how do we keep shortening those loop lengths? They’re important things.

Mala: Some people think storytelling means reading stories around data, and data should present facts. What’s your take on that?

Steph: When you present a fact, you present a story. When we visualize data our job is to help people draw the “right” conclusion as quickly as possible. Our goal is to keep that decision loop small. They don’t have the time to dive into the raw data. They need the facts that fit into their mental story as to what they’re doing. So, you need a beginning, middle, and end. But, when we’re preparing a piece or a dashboard, we’re contextualizing that in terms of how it’s going to be used. It fits within how somebody operates, and that’s their story. I definitely believe that we need to be thinking about how we make this data fit and how to present it so that it minimizes the time to understand it.

Mala: So, the story as a narrative makes sense. Not story as in fiction, right?

Steph: Yes.

Mala: What are the data quality issues that you face when you’re dealing with presentation and analytics, and how do you deal with that?

Steph: There is never an end to data quality issues. A common issue is poor front-end validation and if you get garbage in, you get garbage out.

Problems with data modeling. I see a lot of issues arise because the structure that the data is stored in... it makes it difficult for people to get the job done.

So, they might double information in columns or they repurpose something. They do things to get the job done, but the data doesn’t play nicely as a result.

A lot of the issues are lack of safety around data and lack of modeling around it. So, in terms of improving things, it’s usually a bit more planning and a willingness to refactor. So, if something isn’t working, if something takes a hundred lines of code, but it should be two or three lines of code, there should be a willingness to say, “I’m going to spend ninety-eight lines of codes worth of time, making it so that I can write just two lines of code.”

If you’re starting a new system, it’s a great time to do it. But you can also do this piecemeal. Build quality checks in all along the way. The more checks and considerations of verifying that things are right, the better. And assuming you can give somebody feedback and say this thing is wrong, or this thing needs consideration, the better.

Robust reconciliation and strong constraints on the database, like, say something should be two hundred characters and it should be Unicode. We shouldn’t just soft truncate. We should give you some feedback. “You’ve given us too much data, please fix.” Things like that.

Mala: So, let’s talk about agile.

Steph: In terms of experience with agile and business intelligence, I was in a number of organizations over the years that implemented agile. Usually, scrub methodology and often being top down. “We’re going to do agile so everybody we’re either going to send you on lots of training, or you’re in scrub teams and go!” And I don’t think I’ve ever been in an organization where that’s gone well.

I think that top-down decision making is one of the issues. And the other is being forced into a system doesn’t help people understand that system very well. Especially with scrum, it ups the meeting workload for a lot of people. They really feel drained by more admin.

So that big kind of issue. And, in BI, people often get used to being able to push out reports immediately. But then somebody comes along and says, “No, you have to do things in the two-week sprint.” And then people are asking, “Where’s my report?” And you have to say, “It’s only going to take me an hour, but you have to wait two weeks for it.”

That level of the cycle is too great. So, I think of it in decision loops. Somebody needs to make a decision. They need some data to make that decision. They have to get it, consume it, and then make the decision.

I’m definitely not a fan of agile. But, reducing that decision loop is all important for adding business value. So, for me, it helped me structure our team, our tools, and the way we think about things that come in request-wise. To minimize that decision loop.

Of course, the ideal answer for making that decision loop as small as possible is the information should already be there. And we don’t have to do anything. The fastest code is the code you don’t run.

Making things available in an easy to consume way for the user is key. So, self-service BI, such as it is, is a great way to go. We set things up and then people can answer questions for themselves. That’s the ultimate in speak, but of course, we can’t get everything answered.

I’m a big fan of Kanban, so you basically just have a great big to-do list—a doing column and a completed column. And that doing has limits so that you don’t spread your focus too much. And then the business people can just go prioritize that to-do list. And they only need to worry about the top things on the list. And it’s about getting things, at least for that BAU—you know, the business-as-usual. Getting it out as quickly as you can, in value order. It doesn’t have to be a big complicated system. It just has to be tasks on the to-do list effectively.

Mala: What are some new technologies in the big data world that you’re particularly excited about?

Steph: I think that we now need to be more willing to put things in different structures than relational, especially because we have more buy in now. Things don’t always have to fit on the table for us to do something with it.

So, we’ve got up for data science along with some systems like H2O and Spark, for doing that at scale. For storing my data in the format that makes it easiest to work with.

When we code a data transfer or an action, or something that needs to happen, we spend most of our time writing the connections, the retry logic, the logging, and the scalability code. We actually spend maybe 20% percent of the time writing actual business logic code.

With Azure Functions, you just write business logic. Everything else is taken care of. And that means you have less wasted time. You’re not doing things that don’t add value. You’re not reinventing the wheel each time. You’re just getting stuff done.

And as data professionals, I think we should spend more time thinking in frameworks and systems so that we spend less time doing the same stuff, and spend more time doing the important value-add tasks.

Mala: What’s the role of documentation in being a good BI analytics person?

Steph: I think documentation is super important for every person. But I think it’s incredibly vital for BI people. And analytics people generally. So, when we’re building reports, we’re putting in filters and all sorts of things that construct a view of the data. And we expect people to be able to make decisions off of these. They can’t make the right decisions unless they understand what’s happening. They need to be able to understand that when we say number of sales, we mean things that have not been refunded within a week, that have a salesperson engage them, and all of these other things that would constitute a sale.

We need that to communicate the understanding properly. And that also helps the future you. So, I always try to be nice to Future Me, because Future Me has done a whole load of other stuff and projects. Basically, the knowledge that Present Me has on a system is going to be much better than the knowledge Future Me has. But Future Me is going to have better coding skills. Hopefully.

I need to give Future Me the information about how things worked, and then Future Me will improve things. So, documentation is as much for me as it is for other people. And, one thing that I have consistently found that helps me have better documentation is to have less code.

So instead of having one hundred SSIS packages and needing to document a hundred SSIS packages, I want an ETL system where those one hundred SSIS packages are automatically generated. Then my documentation focus is on the data sources and destinations. Documenting the code that generates those one hundred things. I can do that much more effectively than I can of robustly documenting a hundred SSIS packages.

Mala: What are your favorite books, blogs, and other means of learning?

Steph: Books that have helped me to no end over the years to be generally better include Don’t Make Me Think by Steve Crook [New Riders, 2014]. That is a book on designing websites, which sounds like it’s irrelevant, but it’s all about identifying conventions, when you should use them and when you should break them. This applies to database design, report design, and effective communication strategies. It’s helping people adopt things quickly by leveraging conventions, and then knowing when you want to do something unconventional. So, Don’t Make Me Think is great.

The Checklist Manifesto [by Atul Gawande, Metropolitan Books, 2009]. It looks at a series of case studies on how checklists have reduced hospital infractions or improved survival rates for mountain rescue. You know the guy who landed the plane on the Hudson River? That was all done basically by a checklist.

So, the aviation industry as a whole thoroughly believes in checklists, and they have effectively a runbook full of these things. In this situation use this checklist and go through it and make sure you don’t forget things and end up safely. So, checklists can be super beneficial. Once you have a checklist—a great way of doing things, then that makes it so much easier to automate.

Because you have your business requirements right there.

The Phoenix Project [by Gene Kim, IT Revolution Press, 2018] helps you understand how you can transform the way you run things to add more value. Nudge [by Richard H. Thaler, Penguin Books, 2009] is a book about the default values of things.

Mala: Default values, interesting.

Steph: If you set the default to be, enroll me in a pension, you’re more likely to accept the default than change to the nondefault, and as a result, more people are saving toward their future.

They still make the same decision. Do I have a pension or not? But we’re unbelievably lazy people who more often than not accept defaults.

So, having a conscious understanding of when we’re building stuff, what can we do to produce the best outcome by default, is very helpful.

Mala: What do you do to gain work/life balance or stress relief?

Steph: I’m very bad at this. So, especially since I’ve started up my consultancy business, that’s meant a lot more travel. Of course, I’m running my own business, so now I’m managing people as well and trying to keep the billable rates up and do all the business admin and things. I’ve settled for less balance on a daily level. More balance on an aggregate level. I might be on the road for five or six weeks quite a bit, and then I’ll try and spend at least two or three weeks a month at home, where I do work that doesn’t take me here, there, and everywhere. Trying to make every day perfectly balanced is a losing battle.

It is somewhat achievable. And it really helps of course, that I love what I do for a living. It’s very stimulating, and other hobbies often pale in comparison to building a new model, or trying out a new package, or reading a blog post. So, I’m very fortunate to enjoy my job.

Mala: What are your contributions to the community, and why do you recommend people be involved with the community?

Steph: I started getting involved in the community when I was twenty-three. I attended a few user groups, did a talk, and then the organizer of the user group was having a baby and moving to a different part of the country at the same time. So, he dropped the user group on me because I was the only enthusiastic one about things in terms of organization. I started running that, and at one point I was running three user groups, looking at a fourth. That’s when I realized I was being a bit insane.

So, instead I just made one super group called Microsoft Stack and kept our groups separate. I also now present around the world. I blog. I help people on different stack groups and on Twitter.

It’s been phenomenal because I’ve met so many great people. I have relatively few friends locally. Almost all of my friends live in other countries. So, going to a conference is actually going to see my friends. And they all understand what I do!

When I go to a conference, nobody asks me to fix a printer. So, it’s been really great from a social perspective.

And from a continuous development and professional development… I might be listening to ten hours of talks a month, which is one hundred twenty professional development hours a year, plus all the time I spend reading blogs or things like that. Over the past seven years, I’ve probably spent close to a thousand hours listening to talks and things and learning through that. And so I’ve gained a huge amount of knowledge from the communities. And it’s great to be able to give back. When I do a talk, that might be anywhere from ten people to three hundred people, getting hopefully an hour’s worth of useful content.

I’m absorbing one hundred twenty hours a year but I’m giving back something like five thousand hours a year of knowledge, which is a brilliant exchange rate in my view.

Mala: Wow. That’s a really fun way to put it. Very practical.

Steph: I’m a data person. I measure these things.

Mala: Is there a funny story you’d like to share? It doesn’t have to be work-related.

Steph: I got home from a day of training some people the other day. I’m sitting on the sofa with the laptop doing a load of business work and eating some popcorn, and I managed to knock the popcorn over. One of my dogs is trying to get it from down the side of the sofa, and it’s just stressing her out because she can’t reach the popcorn. Oz, my husband, very kindly gets the Hoover out to vacuum the side of the sofa, and this freaks out both my dogs, which are both at least forty pounds. And they both climb over me and the laptop, trying to get away from the Hoover. And my Y key goes with them.

So, there I am at seven p.m., the dogs have just crushed me, broken my laptop, and I’ve got a day of training the next day as well. Why is my life so hard? And then I looked online. Did you know there is a website I found that specializes in laptop key replacement?

Mala: Oh my, no. That would be useful.

Steph: The only problem is my laptop, which blue-screened to death the very first time I turned it on, appears to have a non-standard keyboard for the model. So, there’s a permanent reminder of the time when my dogs freaked out and broke my laptop. It was like a week or two ago and I’ve just been pressing the nubbin from underneath the key to type a Y. I’ve just been like completely off my game ever since when it comes to typing. Because everything is wrong now. But the shock, the sheer panic from the Hoover turning on was just unbelievably funny. Being stampeded upon, not so much. You know, it’s like the cucumber behind cat GIFs. They’re very fun to watch.

Key Takeaways

  • Do what’s important, not what’s urgent. It’s so much better to do the important well, and then the urgent thing brilliantly.

  • We have to be ethical about how we use data for analytics. It is not just that the business told me to do it, so I did it. Having orders is not sufficiently a good enough excuse.

  • The fastest code is the code you don’t run.

  • We should spend more time as data professionals thinking in frameworks and systems like that so that we spend less time doing the same stuff.

  • With community, I’m absorbing 120 hours a year but I’m giving back something like 5,000 hours a year of knowledge, which is a brilliant exchange rate.

Recommended books: Don’t Make Me Think by Steve Crook, The Checklist Manifesto by Atul Gawande, The Phoenix Project by Gene Kim, Nudge by Richard H. Thaler

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.188.116.83