Stats for fileset uniprot

Summary

Triple Count: 2809173894
URI Count: 391273031
Average URI length: 29.08, Standard Deviation: 13.19
Average URI reuse: 18.98
Appeared as (ignoring literals):
S only: 101791597
P only: 104
S and P: 0
O only: 39637426
O and S: 249843881
P and O: 23
S, P and O: 0
O including literals: 93843528
Literal Count: 54206102
Average literal length: 158.32, Standard Deviation: 301.96
Average literal reuse: 18.45
Blank Node Count: 0
Average Blank Node reuse: 0.00


Detail Navigation

Node appearances as S, P, O, SP, PO, OS
Aggregate node reuse
Node lengths


Node appearances as S, P, O, SP, PO, OS

Graph 1 shows the number of times nodes (or node pairs) of a given cardinality appear. So, if there are 200,000 nodes that appear as a Subject on three occasions, then 200,000 will be plotted at an x-position of 3 on the graph.

Graph 2 is more complex: it shows the cumulative entries to give a more readable graph. In this graph, if we have 100,000 nodes that appear as a Subject only once, and 100,000 nodes that appear as a Subject twice, then we plot points at (x=1,y=100,000), and (x=2,y=300,000). Thus, if a given Subject exists many times relative to the size of the dataset, it will cause a pronounced upward tick in the graph. This second graph is useful for showing the proportion of an index over S (or P, or SP, etc) that will be made up of small entries, vs large ones with repeating elements.

Data Files: S P O SP PO OS

CardinalitySPOSPPOOS
Total35163547812734368743221516195605916610132734351818
1-139749143012242232818221704034705405562678549398
2-25559068301170365072634736594173121436782765
3-316523708027114367375504093813767119019654
4-4114303410365180185194729179764421
5-55752846509400196320698952966080
6-66038859407733445298296747894740
7-7423713704443999237057225623600
8-8629510504032765150429617970580
9-9951136803219638143658114624600
10-194680588719085318774939152529010
20-293755731611516494230491310821050
30-39303443104320446227382866980
40-49162276001570855379591465000
50-59627111090547112841933960
60-69340188064624216317780200
70-7916381505195829945458700
CardinalitySPOSPPOOS
80-894649503834838188344610
90-993241403053160174300410
100-1991134811143728214601415860
200-2993234615374630725555200
300-39915111191191273207830
400-499661211004573124970
500-5993891763236692080
600-6992721591425570730
700-7991670441915655100
800-8991121353810844660
900-99997028159733650
1000-109910421390
1000-1999522815570519204590
2000-29993501717735096970
3000-39992332519623353560
4000-49992054347420336870
5000-599985122188526170
CardinalitySPOSPPOOS
6000-699930013773016520
7000-799919110281912170
8000-89996184169480
9000-99996070868070
10000-1999916536691647250
20000-29999121904121400
30000-399992399529120
40000-499992566925900
50000-599990144803430
60000-699992221522930
70000-799990118402410
80000-899990019302190
90000-999990113901180
100000-1999991168415700
200000-2999990818801780
300000-39999901940900
400000-49999902610580
CardinalitySPOSPPOOS
500000-59999902410350
600000-69999903230170
700000-79999903150190
800000-8999990012090
900000-99999900100100
1000000-199999909620540
2000000-299999900110120
3000000-3999999019080
4000000-4999999005040
5000000-5999999023030
6000000-6999999014060
7000000-7999999045050
8000000-8999999011030
9000000-9999999023030
10000000-19999999012150130
20000000-29999999033050
30000000-39999999074030
CardinalitySPOSPPOOS
40000000-49999999011010
50000000-59999999011010
60000000-69999999040000
80000000-89999999001010
90000000-99999999010010
100000000-199999999062010
200000000-299999999030000
400000000-499999999010000


Aggregate Node Reuse

These graphs illustrate the number of times nodes are reused across all elements of a triple. Graph 1 shows the number of nodes that have been reused a given number of times: if 10 nodes appear 100 times, a point will be plotted at (x=100,y=10). Graph 2 is again more complex: if 10 nodes appear 100 times, and 2 nodes appear 101 times, points will be plotted at (x=100,y=1000), and (x=101,y=1202). This aids in visualising what proportion of the dataset is made up of heavily reused nodes vs rarely reused nodes.

Data Files: URI Literal B-Node

#Times reusedURILiteralBlank Node
Total391273031542061020
1-130238413185071190
2-211838128111283870
3-35623164345458530
4-443321078146070560
5-53474655736503910
6-6517885613228300
7-7104463552960790
8-885302341009950
9-99769873894550
10-19818297194487700
20-29416413751516860
30-394324233735890
40-492703065431940
50-591649321297500
60-69919664219000
70-79568046167610
#Times reusedURILiteralBlank Node
80-89202983133480
90-99103007118010
100-199282185693250
200-29968354335700
300-3991643284370
400-499848650910
500-599570834210
600-699409826220
700-799297519200
800-899226015180
900-999176812320
1000-109910150
1000-1999952765620
2000-2999417329670
3000-3999315420710
4000-4999211013770
5000-599914029060
#Times reusedURILiteralBlank Node
6000-69998905910
7000-79997144080
8000-89995903710
9000-99994663110
10000-19999222315800
20000-2999910918190
30000-399997202830
40000-499994302430
50000-599992362100
60000-69999135840
70000-79999771090
80000-89999531400
90000-9999947940
100000-1999991964900
200000-299999691280
300000-39999939560
400000-49999920410
#Times reusedURILiteralBlank Node
500000-59999914280
600000-69999912130
700000-799999890
800000-899999580
900000-999999370
1000000-199999919490
2000000-2999999670
3000000-3999999540
4000000-4999999050
5000000-5999999010
6000000-6999999120
7000000-7999999710
8000000-8999999200
9000000-9999999230
10000000-199999991950
20000000-29999999810
30000000-39999999710
#Times reusedURILiteralBlank Node
40000000-49999999110
50000000-59999999200
60000000-69999999400
70000000-79999999100
90000000-99999999100
100000000-199999999710
200000000-299999999300
400000000-499999999100


Node Lengths

These graphs illustrate the length in bytes of nodes. In both cases, even if a Node is reused many times, it is only considered once in these graphs. Graph 1 shows the number of nodes that have a given length: if 10 nodes have a length of 100 bytes, a point will be plotted at (x=100,y=10). Graph 2 is again more complex, plotting the cumulative space used: if there are 10 nodes of length 100 bytes, and 2 nodes of length 110 bytes, points will be plotted at (x=100,y=1000), and (x=110,y=1220). This aids in visualising what proportion of space is taken up by nodes of a given size.

Data Files: URI Literal

Node LengthURILiteral
Total39127303154206101
1-1064
2-202190
3-3031265
4-40191842
5-50375910
6-60596624
7-701031449
8-801193425
9-902101651
10-1919043957320780878
20-291241672825
30-39639180561560708
40-491310360141207753
50-595695675750303
60-69161304701346
70-797430762338
Node LengthURILiteral
80-898280710313
90-992293684261
100-19942655954322
200-299174616940
300-39903200410
400-49901831947
500-59901203891
600-6990696823
700-7990473241
800-8990351584
900-9990278286
1000-109902449
1000-199901090341
2000-29990110266
3000-3999023946
4000-499908578
5000-599904627
Node LengthURILiteral
6000-699901245
7000-79990811
8000-89990383
9000-99990282
10000-199990372
20000-299990141
30000-39999069
40000-4999902