Chapter 4

Open Data

Higher and Further Education Stepping up to the Challenge

Rachel Bruce and Andy McGregor,    Jisc

It has been impossible to avoid open data in the last few years. This has been driven by support from governments the world over and innovative work in the UK from organisations like data.gov and the Open Data Institute. Despite some strong examples, higher and further education is not moving as quickly as the government. Although the UK government is making great strides, there is not yet compelling evidence on the impact of this effort. So higher and further education is left with the question: is this something we should seek to address urgently or is it more prudent to wait until undeniable evidence of impact is available before deciding to dedicate significant effort to exploiting open data?

Keywords

Open data; UK government; higher education

Defining open data is a non-trivial task, the definition needs to encompass how data is made available and how it can be used. Anyone with any experience with any of the open clans will know that open comes in many guises. The Open Knowledge Foundation gives a full definition of what open means for data (http://opendefinition.org/) but it also gives a succinct version:

Open data and content can be freely used, modified, and shared by anyone for any purpose

What this means in practice is that data that was traditionally managed and used within an organisation is made available via the web for people to examine and exploit. The range of data that this could cover is enormous, from A-road traffic data (http://data.gov.uk/dataset/congestion_on_local_a_roads) all the way to enterprise zones (http://data.gov.uk/dataset/enterprise-zones), and via further education and skills inspection outcomes (http://data.gov.uk/dataset/official-statistics-further-education-and-skills-inspection-outcomes) and student loans (http://data.gov.uk/dataset/student_loans).

At its most basic level making data available can be via a pdf, but real benefits only start to occur once machine readable formats are used. Machine readable means anything that can be processed by a computer, this can be a simple spreadsheet but the really interesting use cases and benefits are realised when Linked Data is used. This data format focuses on the relationships between items and enables software developers to produce applications that can do more useful things with the data. It also makes it easier to connect different data sets to produce innovative new uses.

4.1 Why It Matters

Making data available openly is not free the preparation of the data and the provision of it cost time and effort. So it is interesting to ask why, in an era of austerity, has the UK government pursued an open data approach with such vigour. Unsurprisingly it is for hard practical reasons rather than anything ideological. Open data offers many benefits, the most compelling are:

1. Transparency: Many organisations have a duty to their stakeholders to show what they are doing and how. Open data enables this directly by allowing data literate stakeholders direct access to the data and indirectly by enabling people to build applications that allow the layperson to explore the data via websites, visualisations and applications.

2. Efficiency: Engaging in open data may produce efficiency savings. Savings could come from refining the production and management of data across an organisation, for example by linking up previously separate data sets that contain duplicate items. Another possibility is reducing the amount of time spent helping stakeholders by enabling them to find and use the open data whenever they need it.

3. Innovation: Opening data up in machine readable formats allows entrepreneurs or amateurs to use that data to build new applications that people may find useful. This could be done by using a single data source in an unexpected way or by connecting previously unrelated data sets to support a new type of use. Rufus Pollock, the president of the Open Knowledge Foundation, summed this up as: ‘The best thing to do with your data will be thought of by someone else’ (http://rufuspollock.org/misc/). As well as producing economic value from this new use of the data it can also illustrate new possibilities that can influence the direction of an organisation.

4. Participation: If data is transparent and delivered via innovative applications, this can lead to greater involvement from stakeholders in the work of an organisation. This is most useful in government where participation is desirable on a number of fronts. But other organisations may find participation from certain types of stakeholders, for example, alumni to be desirable. This benefit may be challenging to achieve because of fears that lack of technical and data analysis skills may prevent all but a few engaging in detail and may produce inequalities between the data haves and the have nots.

These benefits are not only theoretical, but a number of studies have looked at the economic value of engaging in open data. CapGemini produced a report that estimated that opening government and public data had a direct impact of €32 billion on the EU27 economy in 2010 with a forecasted growth of 7% a year (http://www.uk.capgemini.com/resources/the-open-data-economy-unlocking-economic-value-by-opening-government-and-public-data). However, this may be an underestimate since a BIS report on the value of the Ordnance Survey open data predicts that by 2016 this data set alone will have increased Great Britain’s GDP by between £13 million and £28.5 million (https://www.gov.uk/government/publications/ordnance-survey-open-data-economic-value-study).

Nearly all the examples we have used so far have come from the work of the UK government since they are not only the leading exemplar of open data in the UK but reports such as CapGemini’s (referenced above) and the Open Data Barometer (http://www.opendataresearch.org/barometer) put them near the top of governments worldwide when it comes to open data.

This started in 2009 when open data was mandated across UK government departments. It continued despite the change of government in 2010 and that year (data.gov.uk), the website that collects open data sets (http://data.gov.uk/), was created. Five years later the site lists over 16,000 published data sets. Data.gov.uk uses a 5 star rating to judge how open a dataset is (http://5stardata.info/). The majority of datasets listed get a rating of 0 stars, which means they are unavailable or not openly licenced. But of those that are available, most are at level three and above which means structured data in an open format. There are over 350 apps on the site which deal with issues from flooding, to safe neighbourhoods to visualisations of where money is spent.

The government has been assisted in this work by bodies like the Open Data Institute (http://opendatainstitute.org/), who promote open data and engage developers and the Open Knowledge Foundation (https://okfn.org/) who built CKAN (http://ckan.org/about/), the software that data.gov.uk is built on.

At the moment government strategy is focused on the development of a national information infrastructure (http://data.gov.uk/consultation/national-information-infrastructure-prototype-document/purpose-nii) this will develop the work done so far and build a better documented picture of what data is available and how it can be used. This will make it easier to find data sets and reuse them in useful ways therefore increasing the benefits to be seen from this wealth of data.

So the UK is at the forefront of open data, it has a rich and diverse data collection and a detailed strategy for ensuring the full potential value of this collection is realised. If the benefits are sufficiently high to attract this level of effort from the UK government then should further and higher education be doing more to exploit open data?

4.2 Where Is Further and Higher Education?

While the government is ahead of the university and college sectors in the UK in terms of coordinated open data strategy and implementation it should be noted that much of the underpinning rationale and technology for open data originated in universities and research institutes. For example the invention of the World Wide Web by Sir Tim Berners Lee at Cern could be claimed as the birth of open data; and since this time academics, such as those at Southampton and the Open Knowledge Foundation, have continued to develop the concept. Open data and sharing has also been used in research for centuries. A fantastic example of open data reuse is that of Matthew Fontaine Maury a historian and oceanographer back in the nineteenth century who studied ship logs and metrological data; on the back of this data analysis he published a chart of the winds and currents that enabled sailors to speed up their course.

In the UK education and research sector as machine readable formats for open data became easier to implement Jisc supported early experiments in open Linked Data, and as part of this the Open University undertook the LUCERO (Linking University Content for Education and Research Online) project. They became the first UK university to open up their data making data sets, including course data, research outputs, open educational resources and administrative data available (http://data.open.ac.uk/). They were closely followed by the universities of Southampton and Lincoln. These developments have enabled the support of applications, for example at Southampton applications have been developed that include an interactive university map widget, a catering ‘menu search’ function, university telephone directories and apps making navigating open days easier. James Leeming the Retail Catering Manager at Southampton University says:

As a Caterer I am often quoting that ‘I bake bread, I don’t do IT!’ we like to keep it simple and this is exactly what Open data does for us. We can use formats and software we are used to and manage up to date real time information. This will ensure we are keeping customers up to date with information that they want. There is more to come as well and in Catering we have designed our whole web site and marketing strategy around the Open Data technology, watch this space, Catering is catching up.

[insert reference from the analysis of the value and impact of Linked Data – Jisc report]

In an attempt to provide a platform for all open data across UK academia Southampton University developed the data.ac.uk hub that acts as a single point of contact for open data from universities in a similar way to that of data.gov.

Opening up data of different types, including manuscripts and historical records, can enable text mining. The Trading Consequences project about Britain’s reliance on overseas commodities in the 1800s has used text mining to explore thousands of pages of related documents for terms associated with commodity trading. Vast amounts of digitised historical records are available about the extent to which Britain relied on overseas commodities in the nineteenth century but despite this huge resource, and maybe even because of its size, the story is impossible to accurately track manually. Trading consequences has made sense of them and provided a visual analysis tool.

Big data is an area of significant interest to all fields of research, and digital text mining has created major efficiencies when comparing a vast number of documents, as well as unearthing new correlations and discoveries; we’ve seen this strongly in biomedicine as they deal with an ever increasing amount of research outputs, and now also in social sciences and humanities in initiatives like Digging into Data. The introduction of copyright exemptions for text mining for non-commercial research gives the UK a lead over many other countries. The only other country in the world that has a similar text and data-mining exception is Japan. Opening up data clearly lends itself to the advantages of text and data mining.

Citizen science is a trend that demonstrates how open data can accelerate research. One of the most quoted examples is Galaxy Zoo, in 2007 a data set of a million galaxies from the Sloan Sky Survey was opened up to all for analysis. The team were astounded when within 24 hours of the launch of the data set they received 70,000 classifications an hour. They say that ‘In the end, more than 50 million classifications were received by the project during its first year, contributed by more than 150,000 people.’ they received multiple independent classifications that tested the reliability of the classifications and the team have proved that the citizen participation was as accurate as a professional astronomers work. This data set has been of use to many researchers and it continues to be today.

Universities have also started to open up usage data. In a Jisc supported initiative universities experimented to test if they could use data in a similar way to that of big companies such as Tesco and Amazon. With the leading example from Huddersfield University, where they made their library usage data openly available, a small but significant development has taken place whereby better services can be offered to learners and researchers by using patterns of behaviour to inform personalised experiences, and also to correlate such data with student attainment. For example if a student isn’t using the library they may be at risk of falling behind and the university can make an intervention to help to prevent their potential dropout. Jisc has worked with a number of universities to develop a prototype national service that brings data such as the National Student Survey, library usage data and other data sets together to help to inform data-driven decision-making.

The policy environment is supporting a move towards open data, for example research funders are encouraging openness, and the Universities UK (UUK) work on efficiency sees open data as something that can support more efficient sharing and reuse in various ways. However, despite this there is a long way to go, and if we look at the education and research sectors in comparison to that of the government it can be argued that these sectors are behind. It appears that there are pockets of excellent practice that have started to prove the benefits but these are not yet connected, and universities and colleges are not on the whole treating open data as a strategic direction. It is perhaps not surprising given the range of use cases and types of data that there are, and the diverse nature of universities and colleges. Coordination is challenging. Whilst it might be true to say that some of the benefits are a bit of a leap of faith the examples above do show that there are innovations that can come from open data.

So why should further and higher education take open data seriously? It is, we argue, for very similar reasons to that of the government.

Transparency is a driver, for example knowing what research funding has produced, or giving confidence of fee paying students with regard to curricular and courses. It may not be as obvious a case as it is to government but it is likely to become more important as the sector needs to demonstrate the way it goes about its business and contributes to the economy and society. Data about higher education is collected through bodies such as the HESA and UCAS, much of this data is available but there might well be more openness possible in support of further transparency.

Innovation is important, as well as innovation in research and learning there are collaborations to be exploited that open data can support. The Gateway to Research where information about the research funded by the UK Research Councils is made openly available on the web ensures that industry and small and medium enterprises (SMEs) know about publicly funded research and new collaborations can be forged. This also points to participation, open data can help to improve and widen participation – whether this is via citizen science or via the wider participation in education through open educational resources and MOOCs; essentially it increases the reach of the education and research sector.

The Diamond Review from UUK on efficiency in higher education considered open data and recognises that it goes beyond efficiency. There are work streams dedicated to open data that aim to encourage the exploitation of data in support of improved business processes and intelligence whereby administrative data can be better used to lower costs and improve processes; where open data can be used to improve student recruitment, choice and experience and where research management and reuse can be improved via openness.

One of the most significant areas in terms of policy is that of open research data, the Research Councils and other research funders are promoting this since openness can support research integrity by ensuring that results are verifiable; and reuse that drives new research findings. The sector is making advances here, but it is a challenge as culture needs to change and also technical infrastructure to support data sharing needs to be in place at universities and at a national and international level.

In order for education and research to grasp the open data nettle in the same way as the government sector there is a need to address a few central issues. These are:

1. A central and strategic driving force: this could build on the UUK efficiency work, the Open Data Institute and Open Research Data Condordat that is being developed between research funders and universities. For this to happen there will be a need for partnership working since a variety of stakeholders have important roles in the developing open data landscape.

2. There are technical foundations that can help open data to flourish; in particular the use of identifiers. This includes identifiers for data objects, for people, for organisations and for places. Identifiers enable connections, links and correlations to be made between different sets of data. Jisc is working with stakeholders on some related agreements, for example the use of researcher identifiers, ORCID (http://orcidpilot.jiscinvolve.org/wp/) or organisational identifiers such as ISNI (http://www.isni.org/). Alongside this data aggregations need to be coordinated, including data.ac.uk and the national research data registry that Jisc is developing.

3. Legislative barriers need to be overcome: there have been great strides such as the Public Sector Information directive and the UK data and text-mining exception. However, there are still barriers, for example the legal deposit legislation still severely limits access to copies of the UK web archive.

4. Cultural change is required: this is perhaps the toughest aspect of all. Top-down initiatives are helping to drive change, for example data becoming more prominent in research assessment but cultural change is greater than this. There is a need for a data skilled workforce, for all to understand the benefits and be convinced of the rewards of open data so they will participate.

It is hard to escape the conclusion that there are great benefits here for higher and further education if we want to seize them. There are significant barriers but the work of data.gov.uk has paved the way. Further and higher education is full of people with diverse skills and innovative ideas so we are well placed to surmount those barriers. We think that Jisc’s role as a national body also offers opportunities for producing some of the technical infrastructure and support that would need to be reliably and sustainably provided for open data to flourish. When you think of the rich and growing data sets produced by people working in further and higher education and the new and exciting innovations that it could support it seems the answer for the education and research sector is to step up and grasp the nettle of open data.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.227.72.15