Catching up

When I started working with image processing at ContextVision AB (CONTX) in 1984, there was very little digital data available, CPUs were clocking at 16 MHz, and processing a 1024 by 1024 image was not entirely straightforward with only a few megabytes of memory (which in fact was a lot at that time).

My first task, despite the paucity of data, clock cycles, and megabytes, was to design a classifier, mainly to be used with medical images and satellite images, two types of modalities that did produce digital data. A typical (research) use case would be to analyze satellite images for deforestation.

The most advanced classifier we used was the maximum likelihood classifier in which the feature distribution for each class was assumed to be a Gaussian (the output class was the class of the distribution that had the largest likelihood). The parameters to estimate from the annotated training data were the covariance matrix and the mean vector of each distribution. With only a handful of features and classes, the total number of parameters seldom exceeded one thousand.

Artificial neural networks was a marginal research area during the “AI winter” of the 1990s and 2000s. This changed about 2010 when artificial neural networks developed by the few survivals of the AI hibernation started to outperform other methods, particularly when trained and executed on a GPU.

“Neural networks – what’s happening in the US?”. This was in 1988 so the answer was “not much”. Now things are very different although we are still waiting for that Neuralink anticipated on the cover of the report.

ContextVision jumped onboard the deep learning fast train around 2015 and have since then developed many models to analyze and process medical images, mostly using convolutional networks. As the CTO of the company, I would from time to time request the developers to give me access to some data so that I could do some hands-on model building. It was great fun to for instance be able to show that an artificial neural network did a good job at constructing a brightness mode image (the standard output from an ultrasound machine) from the radio frequency image generated internally in the ultrasound machine. It could even mimic our own proprietary image enhancement algorithms when trained end-to-end.

In 2022 we decided to spin off Inify Laboratories (INIFY) to capitalize on our investment in an AI that detected cancer in images of (sections of) prostate biopsies. I had to focus on the laboratory design team and process and didn’t really have time to closely follow the development of AI, let alone write any code.

AI years are shorter than dog years so a lot happened during that time, most notably the emergence of large language models. The capabilities of LLMs and the ecosystem around them are growing fast (although I don’t think many companies are making money on AI). So I decided to take some time to delve into LLMs, learn about new ideas such as agentic AI and vector databases, and maybe write some code.

Back in the days of my open source participation, I used Microsoft’s COM framework extensively and liked it a lot. It had an almost mathematical foundation and existing interfaces to components never changed. Updates were published in a new version of an interface and as a user of the component I could choose if and when to start using the new interface.

The AI platform APIs are the exact opposite. Backward compatibility is definitely not a priority.

I asked ChatGPT for some example code for using the OpenAI API. It gave me an example with a deprecated function call. It also gave me a deprecated name of a model. (When I pointed this out it replied: “You’re hitting a common issue: gpt-4 is not a valid model name for the new OpenAI SDK (v1.0+). It now requires specific, current model names, not aliases.” (A naive user might ask why ChatGPT withheld this “common issue” in its first answer.)

I also tried my luck with Gemini. It actually produced working OpenAI API code in its first attempt which is funny. I guess nobody’s a prophet in their own land.

I then tried my supposedly correct programs with both the OpenAI API and the Gemini API. Both gave me error messages to the effect that I had exceeded my free quota. This was even before my first successful model call. Free tokens seems to be just that – a token offer. I yielded and paid $5 for a bunch of tokens for OpenAI. With that my “Hello, AI!” finally told me that 1 + 1 equals 2. With tokens to spare.

My next step was to try to create a simple “Agentic AI” meaning that the AI decides what to do to accomplish a task, including calls to other tools such as database searches, calculators, web searches, and basically anything that has an API or can be reached through a function call.

I will not bore you with all the trials and tribulations. My experience so far is that to get an agentic AI to do what you want by prompting it is like getting your teenager to clean up their room. After nagging half a day you realize that it is easier to do it yourself. I’m hopeful though, with all the billions invested, that the S-curve is still steep.

Related Post

Leave a Reply

Your email address will not be published. Required fields are marked *