As many people are most likely aware, many Managers & Data Scientists are struggling with the demands of the clients and the capability of the tool sets they have at their disposal - like Hadoop, Mongo, HANA and Tableau, and Qlikview are proving extremely inefficient and expensive for very limited results. The truth of the matter is we do not have a Big 'Data' problem, as the entire collection of Human data 'elements' is less than 2 million objects, so all we really have is a 'Data Management' problem.
Despite the best efforts of the best computer scientists in the world, they will all fail, simply due to the mathematics of the situation - Dr. Codd, created, devised, extrapolated 6 forms of Data Normalization, and after 45 years (1971), every single DB system to date has not exceeded the 3rd normal Form, except AtomicDB, (N) Normal Form improving on Dr. Codd's work.
There was and is a method to his theory, with each level of Normalization brings a geometric increase in 'efficiency' and scalability, however, no one has even attempted the restrictions of the 4th normal form (no duplicates) let alone the 5th or 6th, and now with the creation of the (N)th Normal form, we can do anything the human mind can devise on a computer an in nearly real time.
Imagine how much Raw Data can be processed now? Now we can do in hours what would normally take months, even years.
Forms of Data Normalization
Data ingestion refers to the process of importing and processing raw data from different sources into a common location for future usage and storage. Datafication, on the other hand, denotes the process of converting physical events, either in the forms of measurements (quantitative data) or qualitative models (analytical, computational, or empirical), to a common, computable representation. Thus, data ingestion is usually the precursor to datafication. It has largely been done using relational and hierarchical databases or tuple stores consisting of linked 2-D tables or tuples with table entries or tuple entities representing the data attributes of interest. The attributes can be automatically extracted once they are defined by the domain experts. This technique has been effective in managing homogeneous (identical modality) and stationary data sets of similar spatio-temporal scales. However, it becomes extremely inefficient and often infeasible for complex cyber-physical systems (CPS), where the data sources are continuously operational, inherently heterogeneous, and involve multiple spatio-temporal scales with varying attributes for each modality-scale combination. All of our use cases provide such challenges, necessitating the development of a novel data ingestion, and subsequently datafication paradigm.
While commonly contextualized data entities will be physically co-located in the organizational vector space, a virtual pointer-like token, (the basic relationship construct representing a relationship in an associative dimension in the vector space) provides the means to 'connect' anything existing in vector space to anything anywhere else in vector space. Each token uniquely identifies a particular atom of information, is the virtual location of the item in vector space, and is the logical address of where the item exists on the physical storage medium. This capability ameliorates the requirement for physical co-location to articulate sub-spaces and uses instead, 'associative nearness' or 'dimensional proximity', allowing endless clustering possibilities of data element collections in an unlimited number of sub-spaces that are not restricted by the physical localizations of the items in the storage medium. This multifaceted holographic-like synthesis is key to the complex mapping required in CPS systems as it allows for viewing of data from virtually any perspective without the need to additionally search and process the data.
Mapping of anything to anything in any of a billion associative dimensions is an inherent capability of AtomicDBTM. These dimensions have three distinct forms to discriminate functional perspectives and qualify what can be acted upon and by whom. Specifically, one form is used to model and coordinate record oriented data sets and data streams, one to segregate the activities of specific users, applications, or CPS system components, and one to model any number of semantic namespaces and/or taxonomies and/or ontologies. This discrimination is valuable when trying to model, map, and coordinate numerous different datasets and data sequences with distributed capabilities, APIs, and processing functions within a CPS system, where tabular and node-graph (tuple-based) systems are inherently inadequate. AtomicDBTM handles this easily.
The Human Brain
One such paradigm already exists in Nature in the form of human long-term memory that is able to process, encode, consolidate, retrieve, recollect, and forget information from the constant influx of data streams via the sensory organs. We draw inspiration from this paradigm to propose using an associative memory (AM) tool, developed by Atomic DB Corp, which can address all the four Vs of CPS big data to realize domain-agnostic data ingestion and datafication.
We first replace the data items stored in the tables or tuples with elementary 'atoms' of information that reside in a common n-dimensional (n ~ 1 billion) associative vector space of contextual associations among the data attributes. The information atoms are then stored in the physical memory using another organizational vector space with 2120 distinct item locations. In this way, the AtomicDBTM AM tool (Figure 1) provides a unified and compact representation that can accommodate any data type, dynamic or stationary, structured or unstructured, of arbitrary size, origin, and granularity.
The atomic pieces of information are represented as byte arrays of arbitrary sizes where the maximum size limit is determined by policy or, in its absence, enforced by the operating system constraints. The associations among the data items are naturally formed based on all the attributes such as the names, counts, hierarchical relationships, and both quantitative and qualitative properties of the items. Of course, these attributes will vary widely among items representing fundamentally different physical entities such as battery cells and turbine blades, but will be typically identical, albeit with different values, for the same entity type. For example, two battery cells may have different voltage specifications but will have exactly the same set of properties like open circuit voltage, maximum current, heat dissipation rate, charge/discharge rate, dimensions, etc. Thus, various instances of identical entity types will lie in the same sub-spaces of the n-dimensional associative vector space, whereas instances of different entities will occupy different sub-spaces therein.
Multiple occurrences of the same entity instance are automatically unified and represented as the same atomic piece of information and the transactional integrity of the relationships of each instance is maintained. The provision to add more attributes, and, thereby increase the dimensionality of the occupying sub -space is always available. Note that the sub-spaces for the different entities may overlap, indicating the presence of common attributes among the entities under consideration. This occurrence builds associative 'bridges' at a contextual level enabling automated discovery and correlation. Furthermore, all the items of a particular entity type will be contextually connected to each other, and so are coerced to be physically co-located for access efficiency.
A cluster of related attributes effectively defines a sub-space. Items within a cluster with a high degree of similarity will typically have a high density of interrelatedness. Items belonging to different clusters may also have some connections between them, but those will be much sparser. These connections are always bi-directional and dynamic enabling the additions of new associations, automatically re-qualifying the sub-space as more data is ingested and models are dataficated.
Thus, AtomicDBTM provides a natural and convenient way to encode contexts among seemingly disparate data sources and models while providing a common framework to house and represent all forms of physical activities, events and entities, and simultaneously provide a means to transparently add meta-information to any single piece of data or data set. A key novelty of the tool is that one no longer needs to search for identifying any form of associations amongst the data items as all such associations are maintained as tokens, coincident with the data items related. Thus, they can be obtained merely be referencing the items of interest and using the aforementioned tokens found therein to directly (through a contextual mapping algorithm) index the referenced items in the storage medium. This novelty, coupled with other associative memory functionality, provides the capability for real-time correlation of semantic information enabling one to build an adaptable and evolvable knowledge representation framework directly within the tool itself.
Do not manage data ‘structures’
Most systems spend lots of IT resources just managing structure. In AtomicDB, the data is the structure, it stays on the disk and is referred based on associations in vector space with pointers to data.
Single Instance Storage
There is no duplication ever in our database. This affords dramatic impact on speed, simplicity, production time, and storage. Reduce storage requirements by as much as 60-90%.
Data type agnostic
Ability to combine any number of disparate data sets into one, and then create any number of sub-models on the fly.
No query Language
We do not query or search for data. We simply filter – in essence everything else is excluded.
Data volume not an issue
No matter how large the data set is, it’s only four reads away. Anyone can do this without writing a single line of code.
Energy Saving (Green Technology)
With truly large sized data, we decrease storage by 60%-90%. With the known global datacenter base of 500,000+ existing data centers in the world and a 4.5 Trillion$ market 1/3 is electrical cost, imagine reducing that by half, there would be close to one trillion$ in electricity alone, that would not be required, or 2.7 billion barrels of oil ($36 /barrel) per year, not to mention millions of tons of coal. While this might seemed farfetched, consider people paying two dollars for a small bottle water on a planet whose surface is ¾ water. The endpoint is that we can claim to be a GREEN technology as well.
Any new data updates the entire system in real time, so reports are always live and current. Ad hoc reports also delivered in real time.
Network aware database
Replicate simultaneously to over 4 billion systems and more if we move from a 128 bit name space to a 256 bit space.
AtomicDB footprint is 27 MB and thus can install on a cellular phone, sensor or appliance; yet our capacity exceeds Exabyte.
Simple and easy
AtomicDB has only 6 API’s to learn and perform complete database operations.
Un-compromise able security
Security is unbreakable, due to the use of an algorithmic multi-level factorial encryption.
Inherently, AtomicDB is able to take any data and correlate them, regardless of complexity or size, into a useful coherent result set, all on the fly, with no query language and no extensive programming skills required.
Aggregation of multiple data sources into a data warehouse
Inherently AtomicDB is able to connect to and import from, in a bi-directional way, any and all information/data from various systems, including DB2, Oracle, MS SQL, MYSQL, Access, Excel and text files, *.csv. *.txt *.XML. Using our partnership with www.crossing-tech.com and their ‘Connectivity Factory’ product, we are now able to connect and synchronize with over 550+ disparate data sources and allow ETL and application of business rules on the fly.
The 800 pound gorilla in the room when it comes to ‘Big Data’ for every single client and vendor is the ability to remove duplicate information and or correct automatically bad information or data. No vendor today has this ability inherent in their systems simply because they are based on the third Normal Form, thus limiting their ability and efficiency. We have an automated data cleansing capability, though what they call ‘Reflexive Association’. This process is possible only because of the nature, architecture and design of AtomicDB. While all other systems must rely on string comparisons and or binary comparisons and mountain of SQL queries or other Query type, AtomicDB focusses on the association element and compares them, and is thus several times more efficient and more accurate.
In the last decade, the reporting industry has grown to huge proportions, with dynamic products such as Tableau, QlikView, Pentaho, iDashboard etc. While these vendors make big claims, they are essentially glorified reporting systems no more dynamic than the old Crystal reports of a bygone era. The data is still only as fast and as accurate as the query and the underlying SQL database server.
Data visualization still needs to be addressed with the creation of a new User Interface. We are incorporating over a hundred basic charts and graphs for common use and making available over 150 other type of gauges, graphs and maps to visualize the data anyway the customer wants, all through web services.
We have collaborated with www.gencodestudio.com This Company has created a vector based 3D visual rendering engine allowing for the creation of anything from Avatars to virtual reality. AtomicDB is the only Vector based Information system in the world capable of matching up with this technology to create spectacular multi-D data visualization. This technology will also be incorporated as an option for the client beyond the standard visualization fair of all other vendors.