Roberto Boselli et al.
Unfortunately the quality of the labour market administrative archives is very low( see, e. g.( Cesarini, Mezzanzanica and Fugini, 2007)). The dataset extracted from an Italian Region administrative archive undergoes some cleansing activities performed during the ETL process, the RDQA approach presented in this paper has been used to assess and improve the overall data cleansing process.
3.1 Domain description
Here we provide an overview on the key domain concepts of the administrative archives analysed. Every time an employer hires or dismisses an employee, or an employment contract is modified( e. g. from part‐time to full‐time, or from fixed‐term to unlimited‐term), a Mandatory Communication is notified to the Mandatory Communication System and stored into a local database( job registry or registry hereafter). The registries are managed at“ provincial level” for several administrative tasks, every Italian province has its own job registry recording the working history of its inhabitants( as a side‐effect).
In the context of the labour market domain, we introduce the following concepts:
• Event: It represents a mandatory communication( i. e., a data item) composed by the following attributes
( w _ id, e _ id, e _ date, e _ type, c _ flag, c _ type, empr _ id) where:
• w _ id: it represents an id identifying the person involved in the event;
• e _ id: it represents an id identifying the event;
• e _ date: it is the mandatory communication occurrence date;
• e _ type: it describes the event type occurring to the worker career. The allowed event types are: the start or the cessation of a working contract, the extension of a fixed‐term contract, or a contract type conversion;
• c _ flag: it states whether the event is related to a full‐time or a part‐time contract;
• c _ type: it describes the contract type with respect to the Italian law( e. g. fixed‐term or unlimited‐term contract, etc.).
• empr _ id: it uniquely identifies the employer involved in the mandatory communication.
• Career: It is composed by an ordered( with respect to the e _ date attribute) and finite sequence of events describing the personal history of working contracts from the beginning to the end of the data observation period.
We closely look to the consistency of the worker careers, where the consistency semantics is derived from the Italian labour law, from the domain knowledge, and from the common practice. Some constraints can be identified:
• c1: an employee cannot have more than one full‐time contract at the same time;
• c2: an employee cannot have more than K part‐time contracts( signed by different employers); in our context we assume K = 2, i. e. employees cannot have more than two part time jobs active at the same time;
• c3: a contract extension cannot change neither the existing contract type( c _ type) nor the part‐time / fulltime status( c _ flag) e. g., a part‐time fixed‐term contract cannot be turned into a full‐time contract by an extension;
• c4: a conversion requires either the c _ type or the c _ flag to be changed( or both).
For simplicity, we omit to describe some trivial constraints e. g., an employee cannot have a cessation event for a company for which she / he does not work, an event cannot be recorded twice, etc.
If all the semantics constraints are met a career is considered consistent, it is inconsistent otherwise. To give an example, a consistent career can evolve signing a part‐time contract with company i, then activating a second part‐time contract with company j, then closing the second part‐time and then reactivating the latter again.
88