Documentation

What is Open Data?

Open Data or reuse of PSI?

Open Data Principles

What does dataset mean?

Open Data Benefits

What is Open Data?

Open Data is a philosophy that aims making available to people the data managed by the public administration about its role development in simple formats and no processing.

In this way, any citizen or company must be able to analyze, reuse and redistribute these data, generating new services and allowing public administration improve openness to citizens (open government) and to promote the richness generation through a resources intelligent management (smart government).

The aim is to bring to citizens and companies the advantage of these data to generate economic value. They’ll be able to develop a new idea that will generate new data, knowledge or the creation of new services that would bring economic and/or social benefits. These people or companies are used to be called “infomediaries” or “reusers”.

Open Data or reuse of PSI?

The terms Open Data and reuse of PSI (Public Sector Information) are highly linked. In both cases, the aim is making available administration “gross data”.

Although both terms could seem similar, the concept Open Data purchase to offer data in completely free formats (no owners) and it does not take into account the payment for these data (they must be free). However, the Reuse of Public Sector Information foresees the possibility of paying for the use of these data and its publication in any format.

Open Data Principles

In order to assure that we are talking about Open Data, and not about a substitute, which will not hold up the real Open Data philosophy, it is necessary that the data fulfill the following principles:

  1. Public
    All public data must be open (all those that are not held by privacy, security or copyright restrictions). So, the administration would not have any deal to decide which data must be published.
  2. Detailed
    Data must be published such as they appear, with no processing. Therefore, this means that they must keep the maximum details as possible, which is called “gross data”.
  3. Updated
    Data must be available frequently in order to keep its value, precision and updating.
  4. Accessible
    Data must be accessible to the greatest number of users as possible. It must not be any restriction for those who want to use these data, whatever their purposes.
  5. Automatic
    Data must be structured in order to being processed automatically by a computer. This is a very important condition for data to be reused automatically.
  6. No register
    Data must be available for everybody, with no need to log in.
  7. Open
    The data formats must not have owner. This means that they must not come under an entity or a tool owned by an entity. For example, an open format would be CSV or XML, while owned formats would be Word, Excel, etc.
  8. Free
    Data must be 100% free. So, data must be royalty, patent and copyright free, and not to be subject to privacy, security or privilege right, which could be regulated by other laws.

What does dataset mean?

The terms dataset refers to the categorization of public data in the data index. Gross data are organized in datasets in order to find and index them easily. That is why it we use fields that define the data, like description, update frequency, format and use license among others.

The most used data formats to their open are:

  • CSV (Comma-separated values): is a type of plain text document to represent tabulated data in columns separated by commas (or semicolons) and rows separated by line breaks. It is very easy to use and in many cases it is trivial to export the data of an Excel sheet to CSV format.
  • XML (eXtensible Markup Language): is a simple metalanguage that allows the interpretation of data for different languages. It is the standard for the exchange of structured information between different platforms. Many databases allow the export of your data to XML format.
  • RDF XML / TURTLE / N3 – (Resource Description Framework) is not a specific format but an infrastructure for the description of web resources using expressions from the subject-predicate-object form. The subject is the resource that is described, the predicate is the property on which the resource is to be established and the object is the value of the property with which the relation is established. The combination of RDF with other tools allows you to add meaning to the pages and is a of the essential technologies for the semantic web. There are several representation formats: XML, for automatic processing; N3, for plain text representation of more human readable form; Turtle, as a simplification of the previous one.
  • JSON (JavaScript Object Notation), is a lightweight format for data exchange. JSON is a subset of the literal notation of Objects from JavaScript that do not require the use of XML.
  • JSON-LD (JavaScript Object Notation for Linked Data) – JSON-LD, is a method of transporting linked data (Linked Data using JSON.
  • WMS (Web Map Service) – It is a service defined by OGC (Open Geospatial Consortium) that produces spatially referenced data maps, dynamically from geographic information. It is an international standard that defines a map as a representation of geographic information in the form of a digital image file.
  • WFS (Web Feature Service) – also from the Consortium Open Geospatial Consortium is a standard service, offering a communication interface that allows you to interact with the maps served by the WMS standard. , for example, to edit the image offered by the WMS service or analyze the image following geographical criteria.
  • GML (Geography Markup Language) – It’s a sublanguage of XML described as a grammar in XML Schema for the modeling, transport and storage of geographic information. Its importance lies in the fact that at a computer level it is established as a lingua franca for the handling and transfer of information between the different software that make use of this type of data, such as the Geographic Information Systems.
  • KML (Keyhole Markup Language) is a markup language based on XML to represent geographic data in three dimensions. It was developed to be managed with Keyhole LT, forerunner of Google Earth. Its grammar contains many similarities to that of GML.

Other formats prevent data from being considered as open, such as images (JPEG, GIF, TIFF, etc.), since machines can not interpret their content automatically because they are not structured.

In the case of PDF files, although their origin was intended as portability, they are also not suitable because they can not be structured since they can contain images or be directly an image that contains text.

Formats such as Word or Excel require a license for their use, so it would not be advisable to use them in the field of open data. In the case of data in Excel format, these are easily exportable to text formats such as CSV that would comply with non-proprietary format requirements.

Open Data Benefits

According to the law 37/2007 about the Reuse of Public Sector Information: “the information published by public courts, with the potentiality given by the information society development, is highly interesting for companies in order to operate in its fields, to contribute to the economic growth and to create job posts; and for the citizens as an openness element and a democratic participation guide”. Improving its efficiency and ability to interoperate with other administrations, the Administration is also benefiting of this openness. As a conclusion, there are three different roles that create benefits: companies, citizens and the Administration.

Benefits for companies

The Open Data economic advantages are provided by the possibility that companies produce economic value from public data published by the Administration, creating services and applications from those free data.

This means a new market niche, based on digital contain, what helps to create richness and the possibility to offer added value services. Additionally, it promotes the competitiveness among companies, affording the possibility of tendering this public and free information and obtaining a benefit.

Benefits for citizens

The main advantage of the public data free dissemination is the approach to the principles of open and intelligent government; this means, that government being constantly in contact with the citizens and making the citizen participation and collaboration easy.

Using public data can produce different applications and new social value services improving the citizens day-to-day. The creation of new services by private initiatives, including open data indexes, entails new job posts.

On the other hand, exposing public data in web provides more openness, so that citizens, companies and other organizations can make use of them. This is a big step for the information openness and the achievement of one of the open government aims. Citizens can now have a clearer vision about administration actions and services, as well as how it is investing the contribution and managing public resources.

Benefits for the public administration

The administrations can reduce significantly the cost of expensive application, because they can now be designed by infomediary companies, invigorating the economy this way.

The Administration also benefits of the citizen collaboration, so that they cooperate actively to improve the public service with their own contains, ideas, initiatives and new applications from free public data.

Exchanging data among different administrations (local, central, autonomous), promoting the interoperability, gives, as a result, more efficiency in Administration and public employees working. All this increases the cooperation among administrations, benefiting the citizens.

The interoperability also entails a cost reduction, due to the fact that: if two datasets are referred to the same kind of information and the format is also the same, this will allow obtaining new data uses easily.

The citizen possibility to cooperate allows the politic responsible being informed about their neighborhood curiosities and interests.