Open Data resource pack: version 2

Resource pack to help public authorities develop and implement plans for open data. This is the second version of the document.


7. Select your data

How do you decide which data to publish first? Prioritisation of data release is necessary as it is impractical and potentially costly to release all your data at once. There is no definitive guidance on data prioritisation, there are many ways an organisation can choose to select its data depending on its goals.

This section will present a list of practical guidelines based on best practice from around the world. Annex A has simple downloadable checklist which has been developed to help you navigate each of the steps.

How to select your datasets?

How to select your datasets

Step 1: Identify your data and create asset register

Before you know which datasets to release, you must identify what data you hold. If you do not know what data your organisation has, then you may miss out valuable data.

This may seem like a daunting task as your organisation will likely hold its data in various places and across multiple platforms, for example databases, spreadsheets, folders, documents and websites. Do not let the size and scale of the task put you off. This is an important step in your open data journey and will also be useful for other work related to Freedom of Information and Re-Use of Public Sector Information. Beginning with the identification of high level datasets and adding granularity over time will make the task more manageable.

It may be useful to ask colleagues across your organisation to help with this step, people are likely to have a good understanding of the data held within their own department or division.

Capture metadata

When you are identifying your data you should begin to capture metadata. Metadata is descriptive information about the data. It can describe elements such as the content, format, currency and limitations. More guidance on metadata can be found elsewhere in this pack.

At this stage you should attempt to capture as much metadata as possible as it will make things simpler in later stages. You should begin with what you would like to include in your asset register. The checklist in Annex A provides a short list of key metadata elements which you should begin capturing. For more on capturing metadata, please see Section 8.

Asset Register

You now have enough information to create a comprehensive list of all the data you hold. Your asset register will be used to create an Open Data Publication Plan which will inform the public about the data you hold and intend to release as open data.

Example asset registers

Department for Transport Information Asset Register

DCLG Data Inventory

Home Office Information Asset Register

This asset register does not need to be published and can be kept as an internal resource. However, it would be possible to combine an open data asset register with your organisations PSI asset register. The 2015 PSI Regulations require your organisation to publish a register of both published and unpublished information assets which fall within its public task. The potential open data that your organisation holds may fall out-with its public task. The PSI asset register could be extended to cover all of the data your organisation holds.

Your register is not static, the information you hold will change over time. Your asset register(s) should be reviewed and updated at regular intervals.

Useful Links

National Archives Asset List Guidance

National Archives Identifying Information Assets and Business Requirements

National Archives Information Asset Register Guidance

National Archives Public Sector Information Guidance

W3C Best Practice - Discover published information by site scraping

W3C Best Practice - Identifying what you already publish

Value Assessment

During the initial stages of identifying data and capturing metadata, you should make an initial assessment about the data's value and priority for release. An initial value assessment can help identify potential priority releases. Each organisation will assess the value of their data differently, depending on their priorities and quality of available data. The checklist in Annex A has a handy list of areas which should be considered in order to assess value.

Step 2: Select the open data you want to publish

When it comes to selecting data to publish, there is no right way. The important thing is to begin putting data out there. We recommend starting small and building up. Focusing on a few key datasets will help you create a maintainable publishing process. You should then add more data over time.

You will have to consider dataset prioritisation. Which datasets should you release first? If you have identified a few priority releases, should these be released together or separately? When prioritising your data you will begin shaping a plan for future releases. This plan or schedule will be helpful when compiling your Open Data Publication Plan.

There are number of easy ways to begin prioritising your data.

Start with your goal

You should return to the goals of your open data project and identify the datasets which support the realisation of those goals.

Quick wins

Sometimes an organisation may choose to release data which is easy to publish openly. Examples include upgrading data already published online ( PDFs, Excel files, Word documents or other formats) into an open format. As this data is already released to the public, converting it to an open format should be easy and non- contentious. Another example would be publishing raw data alongside any analysis, if this is not done already.

Small, easy releases help get the project off the ground and build momentum, but organisations should be careful not to rely on easy releases for too long as the public may lose faith in the initiative if valuable datasets remain closed. Demand driven release

Release the data that users want. All organisations have at their disposal a really valuable asset in the requests they get from the public, partner organisations, and other bodies for their data. These requests can be examined to see trends and consistency in type and extent of data required. This intelligence can then help inform the types of datasets that users want to see and the format and frequency that they want them released in.

Examine informal (e-mail/calls) and formal requests ( FOISA requests) for data. Does your organisation have a twitter or Facebook page? Check the comments to see if there are suggestions about possible data you could release. By making the most requested datasets available in a discoverable, open format you can satisfy public demand and help reduce administrative burdens on departments e.g. fewer enquiries or requests.

Another way is to ask the data user what they want. As they are the people who will be using the data, they will likely have a good understanding about which data would be useful. Invite users to suggest ideas via social media, surveys or on your own website. Hack events are also another great way to generate interest in your open data and find out what people want or need to make their ideas a reality.

Perth and Kinross Council: Open Data Workshop

In October 2015, Perth and Perth and Kinross Council ran an "open data identification" workshop with community planning partners, regional organisations and council officers from a range of services. This ensured that a wide spectrum of individuals were able to give insight into which data sets would be most useful for Perth and Kinross to publish.

The datasets identified by the stakeholder workshop gave Perth and Kinross' team a solid basis from which to start creating a Publication Plan. You can read more about this in Annex B.

Scottish Government Dialogue App

Between 8th June and 14th July 2015 the Scottish Government held an open data discussion on the Dialogue App. The Dialogue App is a crowdsourcing software designed for government. It allows the public to suggest, rate and comment on ideas in a collaborative way. The most popular and important ideas can then be easily identified and viewed.

As part of the Open Data Strategy, the Scottish Government made a commitment to engage with the public about which datasets they would like to see released from public sector organisations. The Dialogue App was chosen to hold the discussion as the format enabled everyone to participate in an open discussion.

Over the course of 5 weeks a total of 18 ideas were posted from 9 individuals. Nearly a quarter of the ideas submitted related to the release of information about public sector assets, physical (buildings, land) and non-physical (information, asset registers).

Follow best practice

Open data is growing and there are many public sector organisations both in Scotland and worldwide that are beginning to release their data openly. Don't reinvent the wheel, copy what has worked for others and build upon their success.

Cities and departments all over the world are beginning to release their own open data catalogues. Spend some time browsing their sites, see which datasets are popular and which ones your own organisation could release.

Examples of Open Data Portals
Scottish Official Statistics Leeds Data Mill The City of Edinburgh Council
SEWeb UK Data.gov.uk EU Open Data Portal
New York Open Data Open Glasgow US Data.gov

This list is a very small snapshot of the portals available!

Public services are also reliant on each other for learning and sharing good practice. There are a number of places online you can go to find peers from the wider Scottish public sector to learn from and work with as you start thinking about your own open data plan. In order to facilitate a productive exchange of

ideas, we have opened a Knowledge Hub group. Digital Public Services - Open Data Network is a knowledge exchange and collaboration space. There is also the existing Open Knowledge Scotland Group, and an active network of practitioners and forward thinkers on Twitter. Stakeholders have noted that the practicalities around opening data are difficult and by strengthening these existing networks organisations will help each other progress.

The G8 Open Data Charter, Open Data Barometer and the Open Data Census have all published works detailing what should be considered high value datasets and considered consider priority releases. Of course, some of the datasets may not be relevant to your organisation and you may not be ready to release them just yet, but it is a good starting point if you don't know where to begin.

The following table lists the 14 categories which the G8 considers high value, priority releases. Examples of the types of data which fall under each category are also listed.

G8 High Value, Priority Releases

G8 Category Example datasets
Companies Company/business register
Crime and Justice Crime statistics, safety
Earth observation Meteorological/weather, agriculture, forestry, fishing, and hunting
Education List of schools; performance of schools, digital skills
Energy and Environment Pollution levels, energy consumption
Finance and contracts Transaction spend, contracts let, call for tender, future tenders, local budget, national budget (planned and spent), international trade data
Geospatial Topography, postcodes, national maps, local maps
Global Development Aid, food security, extractives, land
Government Accountability and Democracy Government contact points, election results (national and local), legislation and statutes, salaries (pay scales), hospitality/gifts
Health Prescription data, performance data, doctor surgery locations
Science and Research Genome data, research and educational activity, experiment results
Statistics Data used to produce Official Statistics including the Census, sample surveys and administrative data. E.g. Datasets would include GDP, skills, unemployment
Social mobility and welfare Housing, health insurance and unemployment benefits
Transport and Infrastructure Public transport timetables, access points broadband penetration

Useful reading

Socrata - The data plan

W3C Best Practice - Discover published information by site scraping

W3C Best Practice - Identifying what you already publish

W3C Best Practice - Understand demand for data

Sunlight Foundation Open Data Guidelines 1 - 7

Step 3: Develop an Open Data Publication Plan

Once you have decided which data you want to publish as open data you should develop a publication plan. The benefit of an Open Data Publication Plan is the public will have a comprehensive list of the datasets you will be publishing open data and when they will be released.

The publication plan does not replace the publication scheme you are required to have under section 23 of FOISA. It should be part of your publication scheme which should:

  • signpost your publication plan in your Guide to Information
  • explain briefly how your open data will be published

Contact the Scottish Information Commissioner for more information about the Freedom of Information (Scotland) Act and publication schemes.

The publication plan shows the authority's commitment to open data and demonstrates its understanding of the benefits which releasing data openly can bring. As a guide, it is recommended that any Open Data Publication Plan should:

  • tell users what information is available as open data
  • explain when the information will be available, if it is not already
  • tell users the currency of the data, available formats and licensing conditions
  • provide contact details should someone want to get in touch about the dataset
  • provide details about how users can make recommendations for future

An ambition of the Open Data Strategy was for all Scottish public authorities to have published their Open Data Publication Plans by December 2015. Given the range of public authorities in Scotland, it is recognised that not all will have been able to achieve this. Some have already published their plans, others are currently developing their own plans and some more are currently in conversation with their own governance structures around how to approach a publication plan. Annex A has a link to the template which has been designed to help organisations develop a relevant plan.

The template uses much of the information captured in the dataset asset register. The main difference between the asset register and the publication plan is that the publication plan will only identify the datasets that your organisation has released as open data, or intends to release as open data in the future.

Examples of organisations that have already published an Open Data Publication Plan include:

Contact

Email: Stuart Law, Stuart.Law@gov.scot

Kyle Malcolm, Kyle.Malcolm@gov.scot

Back to top