examineRDF

examineRDF is a tool I developed during the course of my PhD to produce statistics about large RDF datasets on machines with limited memory/cpu. Example output is shown below.

Example output

UniProt
DBpedia
CIA World Factbook
105M triple BSBM dataset

How it works

examineRDF uses the Redland Raptor library to parse incoming triples, and then uses a fixed size hash table to group related data into manageable chunks across a set of temporary output files, which it then processes in more detail. Using this technique it can handle pretty much any dataset while still scaling in linear fashion. It is limited by the fact that it cannot currently produce stats that require joins within the data.

As it stands, examineRDF takes about 2 hours to process a billion triples on dual 1.8 GHz Opteron with 8GB of DDR1 memory. It can run with less memory with very little performance loss.

How can I get it?

ExamineRDF can be found at University of Southampton ECS Forge

email:
CV
Publications
Blog
Projects
About