I've been waiting for this question to popup, and it did. Here's a quick response. More later.
A few quick points:
1: The AI community had no problems calling Harold Cohen's AARON work 'AI' for over 3 decades, featuring it prominently at various AI conferences and AI publications, even though it was just a hand crafted rule based system once you go into the specifics of it. Encoding 'artistic knowledge' in that hand crafted structure.
Just to be clear, we love all things AARON. I wish Cohen had written more about the specifics of its implementation. Unfortunately, he is no longer with us.
2: There is currently a huge fascination with deep learning neural net systems in the AI community, and in the general press, and by diffusion from that press response to the general public at large. And people should be excited, these systems can do amazing things. They are going to revolutionize society.
All of the growth in cloud based computing GPU resource consumption is being driven by deep learning architectures being run in the cloud (take away message from Jenson's keynote at GTC yesterday).
But how do they really work?
What are these deep learning systems really doing?
We are just going to focus on image based stuff for the point of this conversation. But extrapolate that out to audio, or 3d point clouds, or chunks of text, or whatever else you care to.
Deep learning neural network take a set of images (the data),and then they learn statistics associated with that collection of images (the data). What they learn (the statistics they model) is only as good as the data they are trained on.
And there are 'priors' built into the system. These 'priors' are actually innate knowledge built in the system (inadvertently in many cases) by the architecture constructors themselves. And also hand crafted into the overall system by the kinds of data augmentation the user of the system works with.
So already these magical software2 learning systems have some hand crafted by humans knowledge built into them.
The end result is that the trained neural net system has some kind of feature space representation of the statistical properties of the data it was trained on internal to the neural net model used for the training.
Now what these systems are really doing is function approximation. A deep neural net can learn any function approximation (in theory), it is a universal function approximator. So it's leaning some kind of nonlinear transformation. Usually an extremely high dimensional nonlinear transformation.
Now as it turns out, the real world and imagery associated with the real world, (images of faces, or people, or cats and dogs, chairs, whatever), all of this stuff lives on a low dimensional manifold. Higher than 2 dimensions, higher than 3 dimensions, higher than 4 dimensions, but way lower than what would be the case if the structure of the universe, the structure of images, if that was all just random.
And the trained neural net works in some sense because the real world lives in this low dimensional manifold (higher than the 2 dimensional plane an image is laid out on, higher than the 3 dimensional space you move around in, but low dimensional compared to what it could be if things were just random).
Think of an image that is just uniform random noise. It doesn't look like anything except noise. Think of an image of a person's face. there is a lot of structure in that face image, both locally (adjacent pixels are similar, not radically different). And if you think about how the face changes as it rotates, or as the light hitting it changes direction, there is an inherent similarity associated with those changes. That information lives on a low dimensional manifold.
Google manifold if the mathematical abstraction seems too obtuse. Or manifold learning.
But what about the human visual system?
How does it really work?
The human visual system also internally models something associated with the statistics of the real world images it perceives. In a series of transformational mappings that occur in the transformation of an incoming visual signal into the brain through successive processing layers of the visual cortex.
Analogous to what is going on in successive layers inside of a deep learning neural network, but also different. Because what the neural net is learning is ultimately tied to the task it was trained on, and the function approximation might be specific to that task (the specific kind of function approximation that solves the task, which could be very different than the human perceptual representation that occurs in the brain).
You can google visual cortex if you want to learn more.
I have been involved in the application of human visual models to various engineering problems for 40 years at this point. From my masters thesis onwards. If you are curious about the internal structure and gory details of these kinds of engineering models based on neuroscience and psychophysical vision experiments, you can check out this post.
If you don't think that these kinds of models and associated research have anything to do with the human perception of artwork, i would encourage you to check out these books written by extremely esteemed neuroscience researchers.
Vision and Art by Margaret Livingstone here. Including a foreward by Hubel himself (Nobel prize for his work in visual perception).
Inner Vision: An Exploration of Art and the Brain by Semir Zeki here. Zeki basically discovered cortex area V4 and how it works.
It is a very sad statement that neither this book or Zeki's other super great human vision book titled 'A Vision of the Brain' that you can find here is not in an e-book kindle format to read electronically.
Studio Artist is not a NIPS paper. We are not trying to find a new solution to some greatly constrained technical problem that we can then write a paper about so we can get tenure or a job at Google or Facebook.
Studio Artist is not a neuroscience research project. We do track the neuroscience research, we do track the work of the theorists (like Tomaso Poggio at MIT) who try to build theories of how it all works mathematically (check out this lecture if you are curious). We do track the academic research of art theorists who come at this whole area from a very different perspective.
Studio Artist tries to take work metaphors from music synthesis, music synthesizers, and repurpose them for digital visual artist. Concepts like signal modulation (visual signal modulation) are extremely important to our world view. We try to incorporate visual signal modulators directly based on the neuroscience research into how humans perceive visual imagery, and then make them available throughout the program.
How humans perceive a piece of art is ultimately derived from the inherent visual statistics associated with that work of art. Some people have weird hangups about the term 'art'. So substitute 'visual image' if you are one of those people.
Visual modulation derived off of human perceptual visual attributes is a key component of how StudioArtist works. And i just told you it is also a key component in how you perceive art (visual images).
Studio Artist also incorporates all kinds of internal 'heuristic' knowledge into it's internal workings.
Again, we're not trying to write a NIPS paper here. We're trying to build a system for digital artists to make art. Whatever we can throw into that system to make it work better is fair game.
Studio Artist is an active dynamic system. It tries to look at an image like a person would, and then react to that stimuli by building up an art output representation of that perceived 'source'. Like an artist would look at a model (or a poloroid photo of a model), and then re-interpret it.
The paint synthesizer is not an 'image filter'. You could probably configure it to be one if you really want it to work that way for a specific preset setting. But at it's heart it's a system that does active drawing.
Yes, i have also dived into the literature on the neurobiology of muscle movement planning in the cortex.
The initial prototype implementation of the paint synthesizer was conceived to drive a robot for physical painting. That quickly changed as people started using it. That has always been my philosophy, start an engineering project, get it to a place very quickly where people can start actually using it, then react and adapt to how those people use it, and the whole project takes on a life of it's own.
Studio Artist is not trying to replace the human artist. We are trying to augment the human artist. The human artist can choose what level they want that augmentation to occur at. StudioArtist can literally do all of the work, or Studio Artist can aid the artist as the human artist manually paints, or Studio Artist can wiggle bits in the paint while the human artist does all of the stylus driving.
Studio Artist V5.5 is all about expanding the range of what the program can automatically assist the human artist with, letting the program automatically and intelligently build new presets for the system. As opposed to the human artist having to do all of that manually.
Studio Artist V5.5 is also about expanding the range of what a 'source' for a digital art program even means.
Neural nets work off of a database of images to learn to model statistics inherent in that database of images. But an artist can work directly with the database of images, feeding it into Studio Artist, and then generating artwork created from the aggregate statistics inherent in that collection of images. You don't necessarily have to train an abstract model to learn those statistics in some hidden latent space of that model to generate art work derived from them.
All of this sounds like intelligent behavior to me in some sense. And it is happening in a computer program, so it is in some sense artificial.
I guess it depends on your definition of intelligence. And your definition of artificial.
Some of the intelligent behavior may seem stupid at times. But i could say that about people.
Some of the intelligent behavior might seem rote or repetitive in nature, but i could say the same thing about people.
Some of the intelligent behavior is derived from heuristic rules hand crafted by people, but i could say the same thing about the behavior of people.
But it's all based on statistics some people might say, but i could say the same things about neural nets.
Again, at the and of the day, it's a system to help digital artist make artwork. nothing more, nothing less.
And that's really how one should judge it.
Could it be better at certain things,? Sure, absolutely.
Are we going to make it better? We are certainly trying.