Architecting FAIR Digital Objects and Computational Workflows: Interoperable Metadata, Persistent Identification, and Reproducible Research Infrastructures
Department of Information Studies University of Copenhagen Denmark
Abstract
The exponential growth of data intensive research across life sciences, Earth sciences, and computational domains has exposed profound challenges in interoperability, reproducibility, and long term stewardship of digital research assets. The FAIR principles have provided a normative framework for making digital objects Findable, Accessible, Interoperable, and Reusable. However, the operationalization of FAIR at scale requires coordinated infrastructures that integrate persistent identifiers, semantic web standards, research object packaging, workflow provenance capture, and policy aligned governance mechanisms. This article presents a comprehensive theoretical and architectural synthesis of interoperable FAIR digital objects and computational workflows grounded in contemporary specifications and community driven implementations. Drawing on standards such as the Digital Object Interface Protocol specification, PROV O, RDF 1.1, Schema.org, OCFL, BagIt, IEEE 2791 2020, and RO Crate, as well as community platforms including myExperiment, Whole Tale, OpenAIRE, the NCI Genomic Data Commons, and EOSC interoperability frameworks, this study constructs a layered conceptual model for digital object management. The analysis examines metadata modeling through Bioschemas and Science on Schema.org, ontology reuse, machine actionable data management plans, and persistent identifier design patterns. Particular emphasis is placed on computational workflow reproducibility using engines such as Snakemake and Galaxy, software distribution ecosystems such as Bioconda, cross platform packaging in Debian, and provenance frameworks including CWLProv and Pegasus. The article critically interrogates socio technical tensions between researcher usability and stewardship compliance, drawing on debates about data management fatigue and lifestyle oriented FAIR practice. It advances a detailed interoperability architecture integrating digital object identifiers, research object crates, linked data graphs, and repository storage layouts compliant with OCFL and BagIt. Through extensive theoretical elaboration, the paper articulates governance principles for data commons, cross domain metadata harmonization, and standards based international genomic data sharing. The findings demonstrate that sustainable FAIR infrastructures emerge not from isolated tools but from coordinated ecosystems combining persistent identity, semantic richness, workflow transparency, and institutional stewardship cultures. The study concludes by outlining policy, technical, and cultural pathways toward machine actionable, reproducible, and globally interoperable research environments.
Keywords
FAIR digital objects, research objects, computational workflows, interoperability standards
π2. Garcia Silva A., Gomez Perez J.M., Palma R., Krystek M., Mantovani S., Foglini F., Grande V., De Leo F., Salvi S., Trasatti E., Romaniello V., Albani M., Silvagni C., Leone R., Marelli F., Albani S., Lazzarini M., Napier H.J., Glaves H.M., Aldridge T., Meertens C., Boler F., Loescher H.W., Laney C., Genazzio M.A., Crawl D., Altintas I. Enabling FAIR research in Earth science through research objects. Future Generation Computer Systems 98 (2019), 550 to 564.
π7. Giving software its due. Nature Methods 16(3) (2019), 207 to 207.
π8. Goble C. What Is Reproducibility? The R Brouhaha, Hannover, Germany, 2016.
π9. Goble C., Cohen Boulakia S., Soiland Reyes S., Garijo D., Gil Y., Crusoe M.R., Peters K., Schober D. FAIR Computational Workflows. Data Intelligence 2(1 to 2) (2019), 108 to 121.
π10. Goble C., Soiland Reyes S., Bacall F., Owen S., Williams A., Eguinoa I., Droesbeke B., Leo S., Pireddu L., Rodriguez Navas L., Fernandez J.M., Capella Gutierrez S., Menager H., Gruning B., Serrano Solano B., Ewels P., Coppens F. Implementing FAIR digital objects in the EOSC life workflow collaboratory. Zenodo (2021).
π11. Goble C.A., Bhagat J., Aleksejevs S., Cruickshank D., Michaelides D., Newman D., Borkum M., Bechhofer S., Roos M., Li P., De Roure D. myExperiment: A repository and social network for the sharing of bioinformatics workflows. Nucleic Acids Research 38(Web Server issue) (2010), W677 to W682.
π12. Gray A., Goble C., Jimenez R. Bioschemas: From Potato Salad to Protein Annotation. Vienna, Austria, 2017.
π13. Grossman R.L., Heath A., Murphy M., Patterson M., Wells W. A case for data commons: Toward data science as a service. Computing in Science and Engineering 18(5) (2016), 10 to 20.
π14. Gruning B., Chilton J., Koster J., Dale R., Soranzo N., van den Beek M., Goecks J., Backofen R., Nekrutenko A., Taylor J. Practical computational reproducibility in the life sciences. Cell Systems 6(6) (2018), 631 to 635.
π15. Gruning B., Dale R., Sjodin A., Chapman B.A., Rowe J., Tomkins Tinch C.H., Valieris R., Koster J., Bioconda Team. Bioconda: Sustainable and comprehensive software distribution for the life sciences. Nature Methods 15(7) (2018), 475 to 476.
π16. Guha R.V., Brickley D., Macbeth S. Schema.org: Evolution of Structured Data on the Web. Queue 13(9) (2015), 10 to 37.
π17. Heath T., Bizer C. Linked Data: Evolving the Web into a Global Data Space. 2011.
π18. IEEE Standard 2791 2020. IEEE Standard for Bioinformatics Analyses Generated by High Throughput Sequencing to Facilitate Communication. 2020.
π19. Jensen M.A., Ferretti V., Grossman R.L., Staudt L.M. The NCI Genomic Data Commons as an engine for precision medicine. Blood 130(4) (2017), 453 to 459.
π20. Jones M.B., Richard S., Vieglais D., Shepherd A., Duerr R., Fils D., McGibbney L. Science on Schema.org v1.2.0. 2021.
π21. Katsumi M., Gruninger M. What is ontology reuse? Formal Ontology in Information Systems. 2016.
π22. Khan F.Z., Soiland Reyes S., Sinnott R.O., Lonie A., Goble C., Crusoe M.R. Sharing interoperable workflow provenance: A review of best practices and their practical application in CWLProv. GigaScience 8(11) (2019).
π23. Kim J., Deelman E., Gil Y., Mehta G., Ratnakar V. Provenance trails in the Wings Pegasus system. Concurrency and Computation: Practice and Experience 20(5) (2008), 587 to 597.
π24. Kluyver T., Ragan Kelley B., Perez F., Granger B., Bussonnier M., Frederic J., Kelley K., Hamrick J., Grout J., Corlay S., Ivanov P., Avila D., Abdalla S., Willing C., Jupyter Development Team. Jupyter Notebooks a publishing format for reproducible computational workflows. 2016.
π25. Kunze J., Littman J., Madden E., Scancella J., Adams C. The BagIt File Packaging Format. RFC 8493, 2018.
π26. Kurowski K., Corcho O., Choirat C., Eriksson M., Coppens F., van de Sanden M., Ojstersek M. EOSC Interoperability Framework. Technical Report, 2021.
π27. Lebo T., Sahoo S., McGuinness D., Belhajjame K., Cheney J., Corsar D., Garijo D., Soiland Reyes S., Zednik S., Zhao J. PROV O: The PROV Ontology. W3C Recommendation, 2013.
π28. Leipzig J., Nust D., Hoyt C.T., Ram K., Greenberg J. The role of metadata in reproducible computational research. Patterns 2(9) (2021), 100322.
π29. Miksa T., Jaoua M., Arfaoui G. Research object crates and machine actionable data management plans. 2020.
π30. Miksa T., Simms S., Mietchen D., Jones S. Ten principles for machine actionable data management plans. PLOS Computational Biology 15(3) (2019), e1006750.
π31. Mons B. Data Stewardship for Open Science. 2018.
π32. Moller S., Krabbenhoft H.N., Tille A., Paleino D., Williams A., Wolstencroft K., Goble C., Holland R., Belhachemi D., Plessy C. Community driven computational biology with Debian Linux. BMC Bioinformatics 11(Suppl 12) (2010), S5.
π33. Moller S., Prescott S.W., Wirzenius L., Reinholdtsen P., Chapman B., Prins P., Soiland Reyes S., Klotzl F., Bagnacani A., Kalas M., Tille A., Crusoe M.R. Robust cross platform workflows. Data Science and Engineering 2(3) (2017), 232 to 244.
π34. Neylon C. As a researcher I am a bit fed up with Data Management. 2017.
π35. OCFL. Oxford Common File Layout Specification. Recommendation, 2020. https://ocfl.io/1.0/spec/.
π36. RDF Working Group. RDF 1.1 Concepts and Abstract Syntax. W3C Recommendation, 2014.
π37. Rehm H.L., Page A.J.H., Smith L., Adams J.B., Alterovitz G., Babb L.J., Barkley M.P., Baudis M., Beauvais M.J.S., Beck T., Beckmann J.S., Beltran S., Bernick D., Bernier A., Bonfield J.K., Boughtwood T.F., Bourque G., Bowers S.R., Brookes A.J., Brudno M., Brush M.H., Bujold D., Burdett T., Buske O.J., Cabili M.N., Cameron D.L., Carroll R.J., Casas Silva E., Chakravarty D., Chaudhari B.P., Chen S.H., Cherry J.M., Chung J., Cline M., Clissold H.L., Cook Deegan R.M., Courtot M., Cunningham F., Cupak M., Davies R.M., Denisko D., Doerr M.J., Dolman L.I., Dove E.S., Dursi L.J., Dyke S.O.M., Eddy J.A., Eilbeck K., Ellrott K.P., Fairley S., Fakhro K.A., Firth H.V., Fitzsimons M.S., Fiume M., Flicek P., Fore I.M., Freeberg M.A., Freimuth R.R., Fromont L.A., Fuerth J., Gaff C.L., Gan W., Ghanaim E.M., Glazer D., Green R.C., Griffith M., Griffith O.L., Grossman R.L., Groza T., Guidry Auvil J.M., Guigo R., Gupta D., Haendel M.A., Hamosh A., Hansen D.P., Hart R.K., Hartley D.M., Haussler D., Hendricks Sturrup R.M., Ho C.W.L., Hobb A.E., Hoffman M.M., Hofmann O.M., Holub P., Hsu J.S., Hubaux J.P., Hunt S.E., Husami A., Jacobsen J.O., Jamuar S.S., Janes E.L., Jeanson F., Jene A., Johns A.L., Joly Y., Jones S.J.M., Kanitz A., Kato K., Keane T.M., Kekesi Lafrance K., Kelleher J., Kerry G., Khor S.S., Knoppers B.M., Konopko M.A., Kosaki K., Kuba M., Lawson J., Leinonen R., Li S., Lin M.F., Linden M., Liu X., Liyanage I.U., Lopez J., Lucassen A.M., Lukowski M., Mann A.L., Marshall J., Mattioni M., Metke Jimenez A., Middleton A., Milne R.J., Molnar Gabor F., Mulder N., Munoz Torres M.C., Nag R., Nakagawa H., Nasir J., Navarro A., Nelson T.H., Niewielska A., Nisselle A., Niu J., Nyronen T.H., O Connor B.D., Oesterle S., Ogishima S., Ota Wang V., Paglione L.A.D., Palumbo E., Parkinson H.E., Philippakis A.A., Pizarro A.D., Prlic A., Rambla J., Rendon A., Rider R.A., Robinson P.N., Rodarmer K.W., Rodriguez L.L., Rubin A.F., Rueda M., Rushton G.A., Ryan R.S., Saunders G.I., Schuilenburg H., Schwede T., Scollen S., Senf A., Sheffield N.C., Skantharajah N., Smith A.V., Sofia H.J., Spalding D., Spurdle A.B., Stark Z., Stein L.D., Suematsu M., Tan P., Tedds J.A., Thomson A.A., Thorogood A., Tickle T.L., Tokunaga K., Tornroos J., Torrents D., Upchurch S., Valencia A., Varma S., Vears D.F., Viner C., Voisin C., Wagner A.H., Wallace S.E., Walsh B.P., Williams M.S., Winkler E.C., Wold B.J., Wood G.M., Woolley J.P., Yamasaki C., Yates A.D., Yung C.K., Zass L.J., Zaytseva K., Zhang J., Goodhand P., North K., Birney E. GA4GH: International policies and standards for data sharing across genomic research and healthcare. Cell Genomics 1(2) (2021), 100029.
π38. Rettberg N., Schmidt B. OpenAIRE. College and Research Libraries News 76(6) (2015), 306 to 310.
π39. Sandve G.K., Nekrutenko A., Taylor J., Hovig E. Ten simple rules for reproducible computational research. PLOS Computational Biology 9(10) (2013), e1003285.
π40. Schriml L.M., Chuvochina M., Davies N., Eloe Fadrosh E.A., Finn R.D., Hugenholtz P., Hunter C.I., Hurwitz B.L., Kyrpides N.C., Meyer F., Mizrachi I.K., Sansone S.A., Sutton G., Tighe S., Walls R. COVID 19 pandemic reveals the peril of ignoring metadata standards. Scientific Data 7(1) (2020), 188.
π41. Sefton P., Devine G., Evenhuis C., Lynch M., Wise S., Lake M., Loxton D. DataCrate: a method of packaging, distributing, displaying and archiving Research Objects. 2018.
π42. Sefton P. FAIR Data Management Its a lifestyle not a lifecycle. 2021.
π43. Sefton P., O Carragain E., Soiland Reyes S., Corcho O., Garijo D., Palma R., Coppens F., Goble C., Fernandez J.M., Chard K., Gomez Perez J.M., Crusoe M.R., Eguinoa I., Juty N., Holmes K., Clark J.A., Capella Gutierrez S., Gray A.J.G., Owen S., Williams A.R., Tartari G., Bacall F., Thelen T. RO Crate Metadata Specification 1.0. 2019.
π44. Sefton P., O Carragain E., Soiland Reyes S., Corcho O., Garijo D., Palma R., Coppens F., Goble C., Fernandez J.M., Chard K., Gomez Perez J.M., Crusoe M.R., Eguinoa I., Juty N., Holmes K., Clark J.A., Capella Gutierrez S., Gray A.J.G., Owen S., Williams A.R., Tartari G., Bacall F., Thelen T., Menager H., Rodriguez Navas L., Walk P., Whitehead B., Wilkinson M., Groth P., Bremer E., Castro L.G., Sebby K., Kanitz A., Trisovic A., Kennedy G., Graves M., Koehorst J., Leo S., Portier M. RO Crate Metadata Specification 1.1.1. 2021.