CS191 23 04 24
Zhu, Justin

CS191 23 04 24
Mon, Apr 22, 2019


Codd 1972

This algorithm is the one many people use. The purposes of this paper identified the use-case of computers to be computation as well as data storage. Numerical calculations are seen to be necessary for generating the data by Aiken’s Mark I. The calculation involving physical phenomena requires numerical data, but early computers do not have enough storage to process large amounts of this data.

IBM was created to initially tackle this issue, using mechanical parts and large assemblies to create nodes in a directed graph. Language derived from the predicated calculus. Data has to be organized as relations, which really contain a set of n-tuples like triples or quadruples, and each position or column contains data of a particular type.

Codd’s efforts at IBM largely was based off of this work used in Data dependencies.

Data independence is often cited as a major challenge in that the independence of application programs and terminal activities change over time, and there needs to be some comsistency in how this data was conveyed.

All that we really know about ordering dependence is that there exists application programs tht are dependent on the file structure where a distinction is made between presentation and stored ordering in terms of independence.

The order of representations is also a major as well. In order to order and index dependence, we have to slow down response to insertations and representations for how application programs and terminal programs to maintain independence.

A database can also be seen to have multiple rows, immaterial ordering of rows, significance of the ordering of columns

Devoid of pointers, avoiding dependence and hash addressing schemes. No indices or ordering lists is also a major feature.

Permutations and Projections are also ways to characterize the properties of these papers. There is some computation within these databases that enable it to happen and occur.

Jones 1972

Jones’s Paper described how search engines in the mid-1990s have a method that allows for optimizations between the frequency of the words and the word’s significance.

The counter-weighting factor is known as the “inverse document frequency” or IDF. The exhaustivity of document descriptions and specificity of index terms are usually independent. This specificity can be interpreted statistically as a function term rather than term meaning. The collection frequency and other test collections could we written down by others. There are specific use-cases that contain starting terms per request for matching that enable the analysis of natural language.

This is pretty good formalization as a whole. I like how he defined exhaustivity as as the coverage of all topics, and specificity corresponds to the precision of the topics. This criteria seemed to be quite instrumental in the development of modern search engines like Google.