examineRDF is a tool I developed during the course of my PhD to produce statistics about large RDF datasets on machines with limited memory/cpu. Example output is shown below.
How it works
examineRDF uses the Redland Raptor library to parse incoming triples, and then uses a fixed size hash table to group related data into manageable chunks across a set of temporary output files, which it then processes in more detail. Using this technique it can handle pretty much any dataset while still scaling in linear fashion. It is limited by the fact that it cannot currently produce stats that require joins within the data.
As it stands, examineRDF takes about 2 hours to process a billion triples on dual 1.8 GHz Opteron with 8GB of DDR1 memory. It can run with less memory with very little performance loss.
How can I get it?
ExamineRDF can be found at University of Southampton ECS Forge