top of page

Data Modeling & The Art Of The Possible in AI

The world is a chaotic place from the German physicist Werner Heisenberg uncertainty principle to the butterfly effect and the idea that small things can have non-linear impacts on a complex system, our lives are fraught with randomness & stochasticity.

Indeed, life is pretty unpredictable and building AI that account for real-world uncertainty using probabilistic programming technique with the backing of powerful algorithms and compiled languages like C & scripted languages like Python that enable programmers & data scientists focus on high-level ideas like functions, data flow, and big models. These languages let programmers perform as much tasks as they used to do few decades earlier but with much less code and in combination with big data & machine learning.

On the other hand, a data model is an abstract model that organizes data elements and standardizes how the data elements relate to one another and to the properties of real-world entities.

Data models: a theoretical way of understanding concepts or ideas

They are everywhere around us and we use them everyday to make our lives easier. A map is a model of locations sheet, music is a model of sounds even our brain is a model of every decision we make.

If a data model refers to the logical inter-relationships & data flow between different data elements involved in the information world, data is the basic building block of everything you do in modeling.

Until sufficient data are available, you cannot form any theories or draw any conclusions. However, While data is important, the right data is essential. It’s becoming easier to feel overwhelmed by the increasing amounts of data being collected. Understanding what’s important to the business helps data scientists to evaluate what data counts or should be counted. And this is where Big Models come in.

Indeed, as opposed to Big Data, Big Models do not require tons of data to train the system and as we put more and more prior knowledge, we can reduce the amount of training the system need we can make it smarter.

A Big Model consist in a system learning essentially a story about how the data came to be e.g. looking at an image: data on the image come to be because light reflected off a dog and hit the aperture of your camera. The dog has a face, 2 eyes, 2 triangular ears, 4 legs, etc. It takes 50 lines of codes typically and a few examples to train the model. It is a rule based system somewhere between the Expert System & Deep Learning.

‘’we can’t always track what we want to count, but that doesn’t stop us from constantly exploring new ways to get the data we need’’ — Albert Einstein

The richer the story you tell the computer about data, as long as it’s the right story & actually describes the story (of the data) correctly, the more information or knowledge you pour into the system, the faster it learns and the less labeled data or no data (unsupervised learning) it does need in order to learn.

The game of developing data models isn’t over. It’s just the beginning

Like children, successful data models need continuous nurturing and monitoring throughout their lifecycle.

I am father of 3 boys so by experience parenting is a lot of hard work but at the same time it's also exhilarating and taking care of a new life can be all-consuming and very challenging.

Once the model is built and its implementation effective, it can bring additional value to the an organization but just as data model development is planned, continuous monitoring strategies should also be also planned and well anticipated.

Assets should be gathered for that journey during the early stages of model development and should not be an afterthought. Further, validating and revalidating models to ensure expected performance and discourage drifts to discrimination which is critical to a fair and responsible AI.

In addition, are at least as important as model development, model implementation & deployment, monitoring, testing, and evaluation. If use the example on raising kids above, there is much more important work to be done after birth.

Automatic data modeling using Bayesian synthesis of probabilistic programs

Bayesian methods remain the best way to express the probability theory & the theory of uncertainty. Probabilistic programming coupled with a Bayesian approach is the key to build solid data (& machine learning) models and make great approximations & accurate predictions by adjusting parameters, as far as real world data are concerned.

Bayesian program synthesis

Building automated data modeling via Bayesian synthesis of probabilistic programs from the observed data using Python and/or a bunch of Domain-Specific Languages (DSLs) is an approach of choice to ensure that large scale Bayesian models & systems learning essentially the story about how the data came to be.

Indeed, probabilistic programming is a powerful abstraction layer for Bayesian inference, separating the model learning, suitable for probabilistic programming & based on maximum marginal likelihood, and the inference part of the problem.

Our brand new startup: BayesLearn Systems

The new startup I founded last year in the middle of the Covid pandemic with the mission to accelerate the adoption of AI and make probabilistic programming & Bayesian inference broadly accessible. At BayesLearn Systems, we build Bayesian probabilistic programs that model the data and simulate the generative story of your sparse data & how your data came to be.

Each program will try as many possibilities through random variables in the model as it can to try to match the data which gives the posterior distribution as output. Determinism is also be part of the model from the moment when we have pretty good idea of the story that created the data.

Core solution: BayesianRhapsody™

BayesianRhapsody™ is our AI solution that could potentially be used in healthcare, pharmaceutical & medical industries and biotech, bioinformatics, fintech & banking, insurance, manufacturing, logistics & distribution and even automotive, defense, and sports analytics. We are talking to people about global warming & climate data and other datasets.

BayesianRhapsody™ is our AI solution that could potentially be used in healthcare, pharmaceutical & medical industries and biotech, bioinformatics, fintech & banking, insurance, manufacturing, logistics & distribution and even automotive, defense, and sports analytics. We are talking to people about global warming & climate data and other datasets.

We are here to guess what the "music" might do in the future and use the right ''instruments" to keep it going.

Because this is the key success factor, our project team work in collaboration with client business teams in order to implement the expertise adapted to each need by building the right Bayesian probabilistic programs that model the data, start the data generative process, and then make Bayesian inference accordingly.

No doubt, the circle of life is as real in AI model development as it is in the human lifecycle. Given the significant time, energy, and resources invested in our AI projects, it’s easy to feel like a proud parent after a successful model build. And by mindfully approaching the model development lifecycle, particularly Big Model, you can learn from the lessons acquired through continuous model management and produce more responsible and efficient AI models over the long haul.

bottom of page