Stats for fileset ciaworldfactbook

Summary

Triple Count: 161489
URI Count: 30338
Average URI length: 48.37, Standard Deviation: 11.33
Average URI reuse: 13.87
Appeared as (ignoring literals):
S only: 9963
P only: 87
S and P: 0
O only: 28
O and S: 20186
P and O: 74
S, P and O: 0
O including literals: 22311
Literal Count: 22283
Average literal length: 42.96, Standard Deviation: 131.07
Average literal reuse: 2.86
Blank Node Count: 0
Average Blank Node reuse: 0.00


Detail Navigation

Node appearances as S, P, O, SP, PO, OS
Aggregate node reuse
Node lengths


Node appearances as S, P, O, SP, PO, OS

Graph 1 shows the number of times nodes (or node pairs) of a given cardinality appear. So, if there are 200,000 nodes that appear as a Subject on three occasions, then 200,000 will be plotted at an x-position of 3 on the graph.

Graph 2 is more complex: it shows the cumulative entries to give a more readable graph. In this graph, if we have 100,000 nodes that appear as a Subject only once, and 100,000 nodes that appear as a Subject twice, then we plot points at (x=1,y=100,000), and (x=2,y=300,000). Thus, if a given Subject exists many times relative to the size of the dataset, it will cause a pronounced upward tick in the graph. This second graph is useful for showing the proportion of an index over S (or P, or SP, etc) that will be made up of small entries, vs large ones with repeating elements.

Data Files: S P O SP PO OS

CardinalitySPOSPPOOS
Total301491614257112753152745159346
1-1013309511822246517158100
2-2124760612557002878923
3-354630729708851128
4-41152363351642644
5-55734021874625444
6-65058023434019847
7-70013124014430
8-80012019612510
9-9018216811212
10-19013804294198
20-2901176551740
30-39220111591140
40-4970111771860
50-590113533910
60-69224721330
70-79312614190
CardinalitySPOSPPOOS
80-890111290
90-995314190
100-199542410731100
200-29913380431440
300-39936317080
400-499322010
500-599115060
600-699031010
700-799004030
800-899022020
900-999020000
1000-19990146070
2000-2999064020
3000-3999020000
5000-5999001010
7000-7999010000
8000-8999010000
CardinalitySPOSPPOOS
9000-9999041010
30000-39999010000


Aggregate Node Reuse

These graphs illustrate the number of times nodes are reused across all elements of a triple. Graph 1 shows the number of nodes that have been reused a given number of times: if 10 nodes appear 100 times, a point will be plotted at (x=100,y=10). Graph 2 is again more complex: if 10 nodes appear 100 times, and 2 nodes appear 101 times, points will be plotted at (x=100,y=1000), and (x=101,y=1202). This aids in visualising what proportion of the dataset is made up of heavily reused nodes vs rarely reused nodes.

Data Files: URI Literal B-Node

#Times reusedURILiteralBlank Node
Total30338222830
1-11158300
2-2044050
3-3107235620
4-455625420
5-568831630
6-659911730
7-753800
8-855830
9-952420
10-192481810
20-29106680
30-3968390
40-4940230
50-5940150
60-6923100
70-791560
#Times reusedURILiteralBlank Node
80-891000
90-99830
100-199110230
200-299142110
300-399109140
400-4993920
500-5991010
600-699400
700-799420
800-899310
1000-19992110
2000-2999730
3000-3999200
5000-5999100
7000-7999100
8000-8999100
9000-9999500
#Times reusedURILiteralBlank Node
30000-39999100


Node Lengths

These graphs illustrate the length in bytes of nodes. In both cases, even if a Node is reused many times, it is only considered once in these graphs. Graph 1 shows the number of nodes that have a given length: if 10 nodes have a length of 100 bytes, a point will be plotted at (x=100,y=10). Graph 2 is again more complex, plotting the cumulative space used: if there are 10 nodes of length 100 bytes, and 2 nodes of length 110 bytes, points will be plotted at (x=100,y=1000), and (x=110,y=1220). This aids in visualising what proportion of space is taken up by nodes of a given size.

Data Files: URI Literal

Node LengthURILiteral
Total3033822282
1-1011
2-20116
3-301107
4-402344
5-502119
6-601586
7-701497
8-801170
9-901162
10-192154397
20-2901287
30-390837
40-4926689503
50-591186506
60-6912991162
70-79529292
Node LengthURILiteral
80-89195219
90-9986201
100-199129949
200-2995265
300-3991112
400-499181
500-599182
600-699044
700-799254
800-899034
900-999028
1000-109902
1000-19990111
2000-299904