Wednesday, April 08, 2015

Quoting Statistics

Whether you are a prosecution or defence barrister quoting statistical facts has its benefits when quoted to the jury. Using Stats is not without its pros and cons. However, with the ever increasing size/quantity of network traffic and stored data it appears inevitable describing data in a meaningful way to a jury using statistical statements is being re-defined on a annual basis. For example, compare Big Data (http://en.wikipedia.org/wiki/Big_data) and analysis of data at the transport layer level (Internet Small Computer System Interface (iSCSI) Protocol (Consolidated) - http://www.rfc-editor.org/rfc/rfc7143.txt).

Example 1 - GSM SIM Card Authentication
Within the 2G digital mobile telephone (GSM) arena, as you know, makes use of a SIM card. The security implemented in SIM by those commissioned to create its security (Moule, M; & Pautet, M-B; published 1992) introduced the probability that with the subscriber identity (IMSI), secret key (Ki), random challenge (Rand) with a corresponding output generated from the security algorithms A3/A8 (COMP128) to produce a Signed RESponse (SRES) in consequence should generate the probability of any other subscriber producing the same SRES (to make a mobile call, with or without ciphering,), it has been said, can be in the order of 1 chance in 4 Billion.



A counter argument might be that with repeated used of TMSI, ciphering key etc the order of chance maybe considerably less but has yet to be shown to be under 1 chance in 2 billion in the ordinary use of the security. When making analysis of the 3G and 4G security authentication algorithms it can be understood the order of magnitude has again increased exponentially beyond 2G.

However, the above would have no relevance where a call is recorded in a call record where that call has been added but not as a consequence of the subscriber having made the call. An example, upon checking my son's billing record to find there were numerous entries of a regular event of £3.50 for a call being added at regular intervals but at exactly the same time of day after 3pm. The operator was not able to qualify that a call had even taken place, thus remove all those charges. This highlights how call records can and do get manipulated. Had the account been pre-paid what would have been the chances to have identified those calls?

Example 2 - DNA (Profiles, Loci et al)
The principal prosecutor, Assistant U.S. Attorney Michael T. Ambrosino (2006), countered that there was no scientific controversy and that prosecutors should not have to qualify their assertion that the rarity of Jenkins's profile among African Americans was one in 26 quintillion (26,000,000,000,000,000,000).
http://www.washingtonpost.com/wp-dyn/content/article/2006/04/14/AR2006041401602.html


Chimera
A chimera is an organism which exhibits chimerism. Chimerism is the occurrence of more than one genetically distinct cell lines in the same individual. Natural chimerism is quite rare in humans, but much more common in lower species. Natural chimerism occurs when the early embryos of two fraternal twins fuse into a single embryo, producing an individual with tissues of two different genetic compositions. Artificial chimerism is the result of organ or tissue transplants between individuals. The journal Nature had an excellent article on human chimerism in Volume 417, Pages 10-11 (02 May 2002).

Association of pigmentary anomalies with chromosomal and genetic mosaicism and chimerism.
Thomas IT, Frias JL, Cantu ES, Lafer CZ, Flannery DB, Graham JG Jr.
Department of Pediatrics, University of Nebraska Medical Center, Omaha.


We have evaluated eight patients with pigmentary anomalies reminiscent of incontinentia pigmenti or hypomelanosis of Ito. All demonstrated abnormal lymphocyte karyotypes with chromosomal mosaicism in lymphocytes and/or skin fibroblasts. In seven the skin was darkly pigmented, and in all of these seven cases the abnormal pigmentation followed (**)Blaschko lines. The literature contains at least 36 similar examples of an association between pigmentary anomalies and chromosomal mosaicism, as well as five examples of an association with chimerism. The pigmentary anomalies are pleomorphic, and the chromosomal anomalies involve autosomes and sex chromosomes. The pigmentation patterns are reminiscent of the archetypal paradigm seen in allophenic mice and demonstrate the clonal origin of melanoblasts from neural crest precursors. Patients with anomalous skin pigmentation, particularly when it follows a pattern of Blaschko lines, should be appropriately evaluated for a possible association with chromosomal or genetic mosaicism or chimerism.

(**)Blaschko lines are chevron type alternating patterns that appear in skin pigmentation associated with chimera giving a directly observable symptom of at least dermal chimerisation
Am J Hum Genet. 1989 Aug;45(2):193-205
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=2667350

http://www.ncbi.nlm.nih.gov/pubmed/2667350?dopt=Abstract

The above examples provide some observations about the pros/cons of quoting stats.

Digitally speaking, we some times have to refer to size/quantity of data, too. It is useful therefore to have some analogies that can be used to identify the size/quantity of data:

Example 3 - Bits, Nibbles and Bytes

http://highscalability.com/blog/2012/9/11/how-big-is-a-petabyte-exabyte-zettabyte-or-a-yottabyte.html

Bytes(8 bits)
◾0.1 bytes: A binary decision
◾1 byte: A single character
◾10 bytes: A single word
◾100 bytes: A telegram OR A punched card

Kilobyte (1000 bytes)
◾1 Kilobyte: A very short story
◾2 Kilobytes: A Typewritten page
◾10 Kilobytes: An encyclopaedic page OR A deck of punched cards
◾50 Kilobytes: A compressed document image page
◾100 Kilobytes: A low-resolution photograph
◾200 Kilobytes: A box of punched cards
◾500 Kilobytes: A very heavy box of punched cards

Megabyte (1 000 000 bytes)
◾1 Megabyte: A small novel OR A 3.5 inch floppy disk
◾2 Megabytes: A high resolution photograph
◾5 Megabytes: The complete works of Shakespeare OR 30 seconds of TV-quality video
◾10 Megabytes: A minute of high-fidelity sound OR A digital chest X-ray
◾20 Megabytes: A box of floppy disks
◾50 Megabytes: A digital mammogram
◾100 Megabytes: 1 meter of shelved books OR A two-volume encyclopaedic book
◾200 Megabytes: A reel of 9-track tape OR An IBM 3480 cartridge tape
◾500 Megabytes: A CD-ROM OR The hard disk of a PC

Gigabyte (1 000 000 000 bytes)
◾1 Gigabyte: A pickup truck filled with paper OR A symphony in high-fidelity sound OR A movie at TV quality
◾2 Gigabytes: 20 meters of shelved books OR A stack of 9-track tapes
◾5 Gigabytes: An 8mm Exabyte tape
◾10 Gigabytes:
◾20 Gigabytes: A good collection of the works of Beethoven OR 5 Exabyte tapes OR A VHS tape used for digital data
◾50 Gigabytes: A floor of books OR Hundreds of 9-track tapes
◾100 Gigabytes: A floor of academic journals OR A large ID-1 digital tape
◾200 Gigabytes: 50 Exabyte tapes

Terabyte (1 000 000 000 000 bytes)
◾1 Terabyte: An automated tape robot OR All the X-ray films in a large technological hospital OR 50000 trees made into paper and printed OR Daily rate of EOS data (1998)
◾2 Terabytes: An academic research library OR A cabinet full of Exabyte tapes
◾10 Terabytes: The printed collection of the US Library of Congress
◾50 Terabytes: The contents of a large Mass Storage System

Petabyte (1 000 000 000 000 000 bytes)
◾1 Petabyte: 5 years of EOS data (at 46 mbps)
◾2 Petabytes: All US academic research libraries
◾20 Petabytes: Production of hard-disk drives in 1995
◾200 Petabytes: All printed material OR Production of digital magnetic tape in 1995

Exabyte (1 000 000 000 000 000 000 bytes)
◾5 Exabytes: All words ever spoken by human beings.
◾From wikipedia: ◾The world's technological capacity to store information grew from 2.6 (optimally compressed) exabytes in 1986 to 15.8 in 1993, over 54.5 in 2000, and to 295 (optimally compressed) exabytes in 2007. This is equivalent to less than one 730-MB CD-ROM per person in 1986 (539 MB per person), roughly 4 CD-ROM per person of 1993, 12 CD-ROM per person in the year 2000, and almost 61 CD-ROM per person in 2007. Piling up the imagined 404 billion CD-ROM from 2007 would create a stack from the earth to the moon and a quarter of this distance beyond (with 1.2 mm thickness per CD).
◾The world’s technological capacity to receive information through one-way broadcast networks was 432 exabytes of (optimally compressed) information in 1986, 715 (optimally compressed) exabytes in 1993, 1,200 (optimally compressed) exabytes in 2000, and 1,900 in 2007.
◾According to the CSIRO, in the next decade, astronomers expect to be processing 10 petabytes of data every hour from the Square Kilometre Array (SKA) telescope.[11] The array is thus expected to generate approximately one exabyte every four days of operation. According to IBM, the new SKA telescope initiative will generate over an exabyte of data every day. IBM is designing hardware to process this information.

Zettabyte (1 000 000 000 000 000 000 000 bytes)
◾From wikipedia: ◾The world’s technological capacity to receive information through one-way broadcast networks was 0.432 zettabytes of (optimally compressed) information in 1986, 0.715 in 1993, 1.2 in 2000, and 1.9 (optimally compressed) zettabytes in 2007 (this is the informational equivalent to every person on earth receiving 174 newspapers per day).[9][10]
◾According to International Data Corporation, the total amount of global data is expected to grow to 2.7 zettabytes during 2012. This is 48% up from 2011.[11]
◾Mark Liberman calculated the storage requirements for all human speech ever spoken at 42 zettabytes if digitized as 16 kHz 16-bit audio. This was done in response to a popular expression that states "all words ever spoken by human beings" could be stored in approximately 5 exabytes of data (see exabyte for details). Liberman did "freely confess that maybe the authors [of the exabyte estimate] were thinking about text."[12]
◾Research from the University of Southern California reports that in 2007, humankind successfully sent 1.9 zettabytes of information through broadcast technology such as televisions and GPS.[13]
◾Research from the University of California, San Diego reports that in 2008, Americans consumed 3.6 zettabytes of information.

Yottabyte (1 000 000 000 000 000 000 000 000 bytes)

See - http://en.wikipedia.org/wiki/Talk%3AYottabyte#Xenottabyte.3F_Shilentnobyte.3F_Domegemegrottebyte.3F

Other interpretations, see  - http://geekologie.com/2010/06/how-big-is-a-yottabyte-spoiler.php

No comments: