01-Visual Machine Learning

OVERVIEW

In this class we will be looking at Machine Learning as it relates to visuals: images, videos, etc. We’ll examine a few practical applications:

Image Classification

Allowing ML models to distinguish between one class of images and another

Training a model through transfer learning to understand new images through our own guidance.

Real-time classification of recognized objects in video using pre-trained models such as imagenet.

Image Regression

Utilizing a value-based system to interpolate objects based on processed images

Style Transfer

Applying the inferred style of one image to other images

Current State of Visual Machine Learning

Practical examples of ML in the Visual Arts, including GANs, DeepPose, and more.

MACHINE LEARNING WITH IMAGES

Time to put this knowledge to use - starting with images and CNNs.

CNNs

Image learning implementations most commonly use Convolutional Neural Networks. When applied to images, these networks take parts of an image, identify their features, and use those features to make assumptions about the image as a whole based on a comparison of the findings of other neurons.

In a CNN, the convolution is the process of comparing one set of numbers to another based on defined weights and costs to arrive at a single output number.

Weights determine how much influence the input of a neuron has on the output.

Cost is synonymous with error - essentially it’s just how far off we were from an actual value.

The mathematical operation that gives us this output number is the dot product. In image processing, those numbers are the pixel coordinates (x,y), and the color values.

User-uploaded image: Screen+Shot+2018-09-25+at+9.17.14+AM.png

Sampling 8 filters of 25*25 pixel B+W image samples requires 5000 neurons for each layer

Pooling

To reduce the dimensionality and optimize processing we use pooling layers - they analyze the overlap of each layer and reduce them to a less-overlapped matrix.

Feature Extraction

Starting with an initial set of measured values, we derive features, i.e. generalizations, interpretation, prediction.

Bias is how much influence we want our features to have. Low bias means you allow a neuron to make more assumptions about the target function, high bias means making less assumptions.

How Neural Networks Understand Images

User-uploaded image: Screen+Shot+2018-09-24+at+5.08.48+PM.png

Neural Network Feature Visualization

Convnet Viewer

We can see what a convolutional neural network sees by using Convnet Viewer. Download it here and unzip it somewhere on your Mac. If you run it, it will access your webcam by default, and you should see something like the below grid of images. View the annotations to see what is being represented.

Looking at Convolutions

1234567

User-uploaded image: Screen+Shot+2018-10-25+at+2.47.30+PM.png

Convnet Layers

User-uploaded image: Screen+Shot+2018-10-25+at+2.47.56+PM.png

Some of the filters used for first layer convolution

Synthesizing Images to Identify Specific Categories

In reverse, we can synthesize images which maximally activate the neurons responsible for identifying a particular class of image. How? Gradient descent. Difference is that weights are fixed this time, and the pixels are adjusted.

User-uploaded image: Screen+Shot+2018-10-26+at+3.42.36+PM.png

Images that will maximally activate a neuron responsible for identifying banana, parachute, screw, etc.

Look familiar? Sort of like something we saw earlier.

Deep Dream

Deep Dream attempts to take a real image, observe its activations, and do pixel gradient descent to amplify those activations, tweaking the pixels to find patterns.

Why so many puppy slugs?

User-uploaded image: Screen+Shot+2018-10-26+at+3.49.26+PM.png

Puppyslugs in Google Deep Dream

Because the image set had an overwhelming number of classifications of dog breeds, thus more activations. Our neurons are hyper alert to find patterns.

Image Classification

Predicting a label (mapping input to discrete output) for what we see in an image, vs a value.

When might this be used?

We will be using ML5 for most of our image processing in class. ML5 is A machine learning library built on top of tensorflow.js and modeling the coding practices of p5.js/ processing.

The ML5 Image Classifier is trained on ImageNet - a database of 15 million images.

We can apply it to our own unique ends with the help of Transfer Learning - Accessing knowledge gained from solving one problem and applying it to another problem. Reduces the number of variables we have to solve for. In the case of images, methods might include feature or edge detection, thresholding, etc.` Because our network already knows how to do this, and has been trained on a huge database of images already, it will make doing our own image-related tasks much faster and better.

Using a pre-built image classifier

This will be the first time you run any of the class examples. So first we need to get the code.

Download the class files

Make a directory on your computer to hold the class files.

Start a web server:

python -V - gives you your python version.

Python 2: python -m SimpleHTTPServer

Python 3: python -m http.server

Open your browser to localhost:8000

Open the 00-imagenet-classifier.html file

This example uses the MobileNet classifier to make predictions about a photo you give it to analyze. In this example, it correctly identifies an image of a bird. MobileNet is a subset of ImageNet so it can perform a bit faster, but it won’t as accurately guess anything outside of what it has been trained on.

Training our first model

Open the 01-image-classifier.html example, and its corresponding javascript file: assets/js/01-image-classification.js.

Make sure you can see yourself in the webcam window

Make a pose (pretend you’re a dog?) and hit the add pose1 image button about 10 - 20 times. You can move around or change your pose slightly.

Make a different pose (pretend you’re a cat) and hit the add pose 2 image button 10-20 times.

Click train and wait for it to train your model.

Make a pose of your choice and hit start guessing. The AI should be able to classify whether you are posing like a dog or a cat.

Looking at the Javascript file, we can examine what’s happening behind the scenes:

let featureExtractor; //the thing that allows to make assumptions about a specific group

let classifier; //the thing that separates those assumptions into one group vs another

let video; //the webcam image

let loss; //when comparing training vs. validation, how well is the model doing per iteration of optimization

let pose2Images = 0;

let pose1Images = 0;

Here we have declared a number of variables. We declare them at the top here so they will be able to be seen by all of the code that follows (global variables).

The last two variables keep a running count of how many images we have stored for each pose so we can make sure they are all accounted for when we train our model.

function setup() {

noCanvas(); //we don't actually want to display anything other than webcam on the screen

video = createCapture(VIDEO); //Create a video element

video.parent('videoContainer'); // Append it to the videoContainer DOM element

  featureExtractor = ml5.featureExtractor('MobileNet', modelReady); // utilize MobileNet's abilities to extract features from images

  classifier = featureExtractor.classification(video, videoReady); // Create a new classifier to identify those features and give it our video to analyze

setupButtons(); // Create the UI for our user to click on to make things happen

}

Because this example uses p5, the p5 library automatically looks for a function called setup, and runs it before any other function to make sure all of our necessary variables and methods are initialized. Afterward, it will look for a draw function (if it exists) which it will call in a loop in order to update the onscreen (canvas) elements.

We call the createCapture method in order to activate our webcam and make it accessible to p5.

We then assign the ml5 featureExtractor method to a variable, and specify that we will be using mobileNet as the architecture for classification.

Lastly we specify the video as the source for classification, assigning it to the classifier variable, and call the setupButtons method in order to enable interactivity.

function setupButtons() {

// When the pose1 button is pressed, add the current frame as pose 1

buttonA = select('#pose1Button');

buttonA.mousePressed(function() {

classifier.addImage('pose1');

select('#numPose1Images').html(pose1Images++);

});

// When the pose2 button is pressed, assign the current frame as pose 2

buttonB = select('#pose2Button');

buttonB.mousePressed(function() {

classifier.addImage('pose2');

select('#numPose2Images').html(pose2Images++);

});

// Train Button

train = select('#train');

train.mousePressed(function() {

classifier.train(function(lossValue) {

if (lossValue) { //what is the term for what is happening here?

loss = lossValue;

select('#loss').html('Loss: ' + loss);

} else { //we are at the bottom of our descent

select('#loss').html('Done Training! Final Loss: ' + loss);

}

});

// Predict Button

buttonPredict = select('#buttonPredict');

buttonPredict.mousePressed(classify);

}

Above is our setupButtons method. Basically what it does is tell buttons A and B to capture the current frame of the camera feed, and assign them as either pose 1 or pose 2. It then tells the train button to take those captured images and pass them to the ml5 classifier, and begin training.

When ml5’s train method is called, it returns a loss value, which is essentially just the calculated error incurred during training. When that loss value equals zero (remember gradient descent? We’re looking for the bottom of the hill), we let the user know training is complete. At this point, the user can click the predict button to begin having the model guess which pose we are making, which will call the classify method.

function classify() {

classifier.classify(gotResults);

}

// Show the results

function gotResults(err, result) {

if (err) {

console.error(err);

}

select('#result').html(result);

classify();

}

Here, we’re just calling the ml5 classify function, and when ml5 makes its classification, it will call whatever function we gave it as a parameter (in this case, gotResults). Our gotResults function accepts two parameters, which we’re calling err and result - if we have any errors, this function will display them in our browser’s console. Otherwise, it will display its perceived classification in our html element with the ID result, and call classify again in perpetuity, so that we are always guessing what pose a user is making.

Video Classifier

HTML File: 02-video-classifier.html

This example is very similar, except that instead of a live video stream, we are just classifying objects in a pre-recorded video. The code is much simpler.

ml5.imageClassifier('MobileNet', video)

.then((classifier) => {

loading.innerText = "";

video.play();

loop(classifier);

});

We’re again using MobileNet for classification, and once our classifier is ready (.then((classifier)=>{ … }), we tell the HTML video element to play. Once we’ve done that, we start a loop in which we pass our classifier so that it can make classifications of the images it sees in each frame of the video (loop(classifier)).

const loop = (classifier) => {

classifier.predict()

.then(results => {

result.innerText = results[0].className;

probability.innerText = results[0].probability.toFixed(4);

loop(classifier) // Call again to create a loop

})

}

In our loop, we call ml5’s predict function, and when we have a result, we pass it to the result html element. We should note that results is not a single result, but an array of potential results. Because ml5’s most likely guess is always the first result, we are grabbing its class name with results[0]. If we wanted the second most likely result, we’d say results[1].

Also to note, each result is not just some text, but a javascript object, which can have many properties. className is just one of those properties. We also have a probability which gives us a value between 0 and 1 on how likely the model thinks it is that it has made an accurate prediction. The toFixed(4) just tells javascript that we want this decimal number accurate to 4 places. Once we have our probability, we call the loop method again, making sure to provide it with our classifier.

Pre-trained webcam classifier

HTML File: 03-webcam-classifier.html

This is very similar to the previous two examples. The only difference is that instead of doing any training on the webcam feed and assigning classifications, we’re just using the pre-baked classifications provided to use with MobileNet.

Image Regressor

HTML File: 04-image-regressor.html

Javascript File: assets/js/04-image-regressor.js

Remember the difference between classification and regression? One assigns a label, the other a numerical value. In this example, we’ll use that numerical value in order to control the position of an on-screen object (in this case, the a red square).

The structure of the HTML should be pretty familiar at this point: a videoContainer to hold our webcam feed, some status elements (videoStatus and modelStatus) to let the user know our progress, and an addSample button that allows us to store frames of video for analysis by the regressor. The train button signals we’re done collecting samples and want to train our model, and the buttonPredict button tells the trained model that we want to start predicting in real-time.

The only new element here is the input element, which is of type range, meaning its a slider. We give the slider a minimum value of .01, a max of 1.0, a step of .01, meaning each time a user drags the slider, it should increase its value by .01, and a default value of .5, putting the slider right in the middle.

We’re going to move our hand (or head, or some other object) in the video feed to the left, right, or center of the camera. The position of the slider should match the position of our object in the webcam feed - in other words when our head is all the way to the left, the slider should be all the way to the left before we hit the addSample button.

The point of the slider in this example is to let our regressor know to what value the corresponding image should be assigned. In so doing, when we train the model and begin predicting, our model should be able to take what it has learned and make assumptions about the current image it is seeing. In real time, when our head is to the left, it should be able to give that a value according to how far to the left it is, and pass that value to the red square’s X position. Let’s look at the javascript:

let featureExtractor;

let regressor;

let video;

let loss;

let slider;

let samples = 0;

let positionX = 140;

Again, we have a bunch of variables that should be familiar. positionX is set to 140 because that will be the default midpoint of our canvas. You’ll see the overall width of our canvas specified in the setup function:

function setup() {

createCanvas(340, 280);

// Create a video element

video = createCapture(VIDEO);

// Append it to the videoContainer DOM element

video.hide();

// Extract the features from MobileNet

featureExtractor = ml5.featureExtractor('MobileNet', modelReady);

// Create a new regressor using those features and give the video we want to use

regressor = featureExtractor.regression(video, videoReady);

// Create the UI buttons

setupButtons();

}

Here, on top of the video feed, we’re making an HTML5 canvas element of width 340 pixels and height 280 pixels. We’re setting up our featureExtractor, then specifying that we’ll be using the regression method, to which we pass our video feed, and a function to call when the regressor is ready (videoReady). Then we set up our interactivity.

// A util function to create UI buttons

function setupButtons() {

slider = select('#slider');

select('#addSample').mousePressed(function() {

regressor.addImage(slider.value());

select('#amountOfSamples').html(samples++);

});

// Train Button

select('#train').mousePressed(function() {

regressor.train(function(lossValue) {

if (lossValue) {

loss = lossValue;

select('#loss').html('Loss: ' + loss);

} else {

select('#loss').html('Done Training! Final Loss: ' + loss);

}

});

// Predict Button

select('#buttonPredict').mousePressed(predict);

}

This just ties our html buttons to javascript actions. When we click addSample, we get the value of the slider (slider.value()), and increment our number of samples (samples++). We pass the regressor the image (regressor.addImage(slider.value())) so that it can store it for training.

When we’re ready to train, we hit the button, and when we have no more loss, we know our model is trained and we’re ready to predict. (regressor.train(function(lossValue){ … }).

When we click predict, we call our predict function.

// A function to be called when the model has been loaded

function modelReady() {

select('#modelStatus').html('Model loaded!');

}

// A function to be called when the video has loaded

function videoReady() {

select('#videoStatus').html('Video ready!');

}

Our video and model ready functions just let the user know they’re good to go, so that we can begin collecting samples. If we wanted to get fancy, we could prevent sample collection until we knew for sure these methods had been called.

// Classify the current frame.

function predict() {

regressor.predict(gotResults);

}

This function just gets called whenever we hit the predict button, and calls ml5’s predict method. When it has a result, it calls the gotResults function.

// Show the results

function gotResults(err, result) {

if (err) {

console.error(err);

}

positionX = map(result, 0, 1, 0, width);

slider.value(result);

predict();

}

As long as there are no errors, it takes the input value of our object (result), which could be anything from 0 to the width of our video feed (width ), and turns it into a number ( positionX ) between 0 and 1 with p5’s map function. It then moves the slider to that position and, accordingly, the square as well. Then it calls predict again in a loop.

function draw() {

image(video, 0, 0, 340, 280);

noStroke();

fill(255, 0, 0);

rect(positionX, 120, 50, 50);

}

Remember, because we’re using p5, it will automatically look for the existence of a draw function and call it in a loop. Here we have one, and in it we capture an image every frame from our video feed, starting at the top left of our video feed (0,0), and ending at the bottom right (340,280). If we were only concerned with a specific portion of the video feed, say, a 50px rectangle at the center, we might say image(video, 145,115, 195, 165). We then want to draw our red box. It will have no line around it (noStroke()), but will have a red fill (fill(255,0,0)), with the 3 numbers corresponding to red, green, and blue, respectively.

Then we draw our rectangle, placing it at the appropriate positionX x value, a y value of 120, and having a 50x50 pixel size.

Variable Font Example - https://tinyurl.com/y7v5a9uv

Variable Font JS - https://tinyurl.com/y8kqpqc6

The above code uses an image regressor to change the size and weight of a variable font based on a user’s proximity to the webcam.

Style Transfer

Recomposing images in the style of other images using Convolutional Neural Networks.

https://vimeo.com/139123754

What is a style?

In this case, it is a spatial correlation among images (i.e. comparing parts of one image to parts of another). Style is generally determined by calculating the loss with a Gram matrix - essentially just a matrix of dot products for the features of a style layer. If two images have the same Gram matrix, they have the same style (though not the same content necessarily).

Couldn’t load preview

The file may have been moved or deleted, or is temporarily unavailable.

https://shafeentejani.github.io/assets/images/style_transfer/style_reconstruction.png

Gram Matrix

CNNs build information layer by layer, with higher layers basing their information off the learnings of lower layers.

In Style Transfer, we are comparing pixel values of a source image to pixel values of reference image to arrive at a new image by deciding how heavily we want to favor the style or the content, and how heavily we want to favor the source or the reference. We calculate a weighted sum of the content loss vs the style loss, choosing which we want to favor.

Remember, we can optimize this algorithmically, by calculating the gradient between content and style - figuring how much effort is required for a good result (cost) at each step.

To create our own style with which to render images, we need to train a model. This is a very time and processor intensive process, therefore it is important to have a good GPU. Or multiple GPUs! Relying on the CPU could take months!

Why GPUs?

Just because it’s a graphics processing unit doesn’t mean it has to process graphics. But it is well suited for it. CPUs can take a large amount of data and perform orderly calculations and operations within it - it is systematic. GPUs can take small amounts of data and perform simultaneous operations - they run in parallel (see multi-threading).

Why is ML so processor intensive?

Our filter in this case is a width/height/depth. For example, 5x5x3, where we are sampling 5 pixels in width by 5 pixels in height by 3 color channels - Red,Green,Blue.

This takes a ton of memory. Why? We have to store the results of previous tests to compare them to current ones if we are to have successful learning.

Style Transfer Examples

Style Transfer Training Code - https://drive.google.com/open?id=1ift4XQy0YqcUS2mSIcfhpSY8cyarT4wl

Basic Style Transfer

HTML File: 05-style-transfer/00-style-transfer.html

JS: 05-style-transfer/js/00-style-transfer.js

In this example, we have three pre-trained models that allow us to convert an image of our choosing to the style of the model (Great Wave off Kanagawa, Udnie, and a model I trained on one of my own videos) using p5.js.

Our HTML file is pretty basic. It contains a status message that we will update depending on what stage of the style transfer we’re in:

<p id="statusMsg">Loading Models...</p>

a button that allows us to initiate style transfer once the models are loaded:

<button id="transferBtn">Transfer!</button>

The image we are going to apply our styles to:

<img src="img/patagonia.jpg" alt="input img" id="inputImg">

And our various style reference images:

<div id="styleA">

    <p>Style A: <a href="https://en.wikipedia.org/wiki/The_Great_Wave_off_Kanagawa">The Great Wave off Kanagawa, 1829 - Katsushika Hokusai</a></p>

<img src="img/wave.jpg" alt="style one">

</div>

Note that each image is contained within a div (html container element), with an id (styleA). This will allow us to later place the fully processed new images alongside their references.

In the javascript, we declare a bunch of placeholder variables referencing our html elements:

let inputImg;

let statusMsg;

let transferBtn;

as well as some for our ml5 styles:

let style1;

let style2;

let style3;

We can tie the html placeholder variables to actual html elements as follows:

inputImg = select('#inputImg');

where select is a p5 shortcut method that allows us to pass the html id of our image (id="inputImg").

We then instantiate those style variables in order to load the style transfer model as follows:

style1 = ml5.styleTransfer('models/wave', modelLoaded);

The modelLoaded is the name of a callback function that fires when the model has been loaded. It is called a promise - and in conjunction with other promises, it allows us to proceed once do something once all three of our models are loaded:

function modelLoaded() {

// Check if both models are loaded

if(style1.ready && style2.ready && style3.ready){

statusMsg.html('Ready!')

}

Here we’re just printing to the screen that everything is ready.

When we hit the transfer button, we use ml5’s transfer method to capture the information of the chosen photo, and apply the appropriate model to create a new image in that style:

style1.transfer(inputImg, function(err, result) {

createImg(result.src).parent('styleA');

});

The createImg function is part of p5, and the parent allows us to specify the id of the HTML container element into which we want to place the result image.

Applying one style to many images

HTML File: 05-style-transfer/01-style-transfer-multiple.html

JS: 05-style-transfer/js/01-style-transfer-multiple.js

This example is basically the same as the last, with the exception that we are applying only one style to many images instead of many styles to one image.

Applying a style to a sequence of images / animated GIF

HTML File: 05-style-transfer/02-style-transfer-sequence.html

JS: 05-style-transfer/js/00-style-transfer-sequence.js

Similarly, if we extract frames of a video or animation, we can apply the style to them sequentially. Here, the major difference is just that we’re using a javascript loop to call the transfer method:

for (let i=0;i<numImages;i++){

style1.transfer(inputImgArr[i], function(err, result) {

if (result){

createImg(result.src).parent('output-images');

}

});

}

We declare a variable numImages that corresponds to the number of image frames (<img src="img/dance/01.jpg" alt="input img" id="inputImg1">, etc) we want to process, a variable i to keep of how many images have been processed, and then incrementing that variable i each time we successfully transfer an image and output it to our html page, then repeat the loop until all images are processed (the for in our javascript loop).

If we had an exceptionally large number of images, we could even make a data file with all of the image names ( 01.jpg, 02.jpg, etc.), and use p5 to create those images on our html page rather than listing them all out manually.

A good tool for creating the animated GIF as shown on this example is ImageMagick. We won’t cover that here.

Style Transfer for Video

HTML File: 05-style-transfer/03-style-transfer-vid.html

JS: 05-style-transfer/js/01-style-transfer-vid.js

Of course, ml5 can also apply style transfer to videos, and with a little ingenuity we can record those. NOTE: This is very processor intensive - on my MacBook Pro, the frame rate is incredibly low and thus the video jerky.

In this example, we’re creating an html5 canvas element to render our stylized video:

createCanvas(320, 240).parent('canvasContainer');

where 320 is the width, 240 is the height, and canvasContainer is our html container id.

We use p5’s createVideo function to pass the path to our video file and have it create a player:

video = createVideo('/assets/videos/house.mov');

and when we press the start/stop button, we begin applying the style transfer:

select('#startStop').mousePressed(startStop);

function startStop() {

if (isTransferring) {

select('#startStop').html('Start');

video.pause();

} else {

select('#startStop').html('Stop');

video.loop();

// Make a transfer using the video

style.transfer(gotResult);

}

isTransferring = !isTransferring;

}

Here, the isTransferring is just a true/false (boolean) variable that keeps track of whether we are viewing the stylized or unstylized video (started or stopped, respectively), and where gotResult is a call to the function we want to run with each frame of the video that has been successfully stylized:

function gotResult(err, img) {

resultImg.attribute('src', img.src);

if (isTransferring) {

style.transfer(gotResult);

}

You’ll notice above that the result is still an image, but the html canvas allows us to string these images together as a video (which is all a true video player really does in the first place).

The only other thing to examine is the draw function - which is a default function name for p5 that runs on a loop. In this example, our loop is either displaying the video or the stylized image, depending on whether isTransferring is true or false:

function draw(){

// Switch between showing the raw camera or the style

if (isTransferring) {

image(resultImg, 0, 0, 320, 240);

} else {

image(video, 0, 0, 320, 240);

}

Saving the Results

HTML File: 05-style-transfer/04-style-transfer-record.html

JS: 05-style-transfer/js/04-style-transfer-record.js

There isn’t a way within ml5 or p5 to record the video as of the time of this writing, but we can save out each frame and assemble them into a video later with p5’s saveCanvas function. All this does is take the image on the canvas at the time of calling the function, and save it out to a file:

saveCanvas(c, `kinect${i}`, 'jpg');

Note that trying to specify a specific image name doesn’t actually seem to work in the browser, as each browser has its own way of handling saved images by default, which override our attempts to give them a name (kinect1.jpg, kinect2.jpg, etc). Also note that, for my browser at least, the default location is the desktop, which will fill up quickly with images if you let this run.

Creating your own style

Luckily, there are many cloud computing platforms that allow you to do your model training

on machines dedicated to this sort of thing, and then download the result.

We will be using paperspace's gradient. It has been trained on the COCO dataset - 15GB of images trained for object detection.

Process

Install dependencies (Git, NPM, PIP/Python)

Node

Python

Create paperspace account

Create an API Key

Get the sample project

Install paperspace API on your machine:

npm install -g paperspace-node

pip install paperspace

or download here.

paperspace login

Pick an image from your computer that you want as the reference style. Make it no bigger than 500x500 pixels.

Update the python script to reference that image.

In run.sh, change the path of --style images/[YOUR_IMAGE] to the filename of the image you want for styling all other images.

Create a paperspace job to process the training in the cloud. This will take a few hours.

paperspace jobs create --container cvalenzuelab/styletransfer --machineType P5000 --command './run.sh' --project 'Style Transfer training'

--container refers to a docker image that is pre-configured to do our style transfer processing. It installs all the stuff to train the model on the gradient cloud computer so we don't have to install it ourselves on our own machines.

--machineType is a reference to the computer we want to use for processing. Faster machines cost more money. P5000 is a mid-tier machine.

--command runs the run.sh script that we edited.

--project can be whatever we want to call this so we have a reference in Paperspace.

Download the model and put it in our project:

From within the models directory of our local project, run: paperspace jobs artifactsGet --jobId YOUR_JOB_ID where job ID can be found in paperspace under Gradient > Jobs in the Machine Type / JOBID column. Or click on the item under project column, and click download workspace and then copy over the content of the artifacts folder into a new directory within models of your local project.

Point the sample project to our new model directory

In sketch.js, change const style = new ml5.styleTransfer('./models/YOUR_NEW_MODEL'); to reference the name of the directory you made for your artifact from the step above.

Run, and view the result!

From the directory containing sketch.js, run python -m SimpleHTTPServer (python2) or python -m http.server (python3).

Tuning the Results

You can tweak the settings of run.sh to get better / faster results based on your dataset. In general:

rnn_size : number of neurons, should increase the larger your dataset is.

layers : should never be less than 2 or more than 3

seq_length : the length of each sample of text for use in generation, should get larger with dataset size

batch_size: the amount of text to process with each iteration, should increase with dataset size

dropout : is used to prevent overfitting by ‘ignoring’ neurons. The lower the dropout, the greater the chance you will overfit your data (the higher the likelihood that we will sample directly from the data set verbatim). The higher the dropout, the more random things will be. Higher dropout requires more iterations (but lower epoch time per iteration).

Style Transfer References

A Neural Algorithm of Artistic Style

Perceptual Losses for Real Time Style Transfer

Art with Neural Networks

Instance Normalization for Fast Stylization

Fast Style Transfer Github Repo

Current Examples of Machine Learning for Images/Video

SFV - Skills from Videos

Teaching AI controlled 3D characters on videos to learn motion-based skills using reinforcement learning.

https://youtu.be/4Qg5I5vhX7Q

More Info

CycleGAN

Learning the mapping between an input image and an output image, when training data is not available. Cycle refers to consistency - if we train one image on another, and then attempt to train it back, we should end up at the same place. Technically, this means that the translators for each image should be inverses of one another, and we enforce this methodology in calculating the loss function from one image to another (source)

https://youtu.be/9reHvktowLY

Style Transfer For-Ev-Er

From the original creator of Deep Dream

https://youtu.be/1PwY-Hfspoc

FOR NEXT CLASS

Bring in an example of style transfer being trained on a model an image of your choosing.

Try to find a large body of text from which you want to work. The bigger the better (we’re talking a book’s worth. Multiple books, even better). Try to find something very stylistic and unique, as we’ll be generating text in that style. We will need to be able to put all that text into a file (or download a file).

Come up with a project you might like to do with machine learning and images. You don’t have to make it, but we will share our ideas in class. If you come across anything cool that’s currently being done in ML, we’ll share that as well.

Mike Heavers 6 years ago

Toggle between webcam and still images here.

Mike Heavers 6 years ago

This is the first layer of the network. Here, we are analyzing the pixels direct

Mike Heavers 6 years ago

In pooling layers, we reduce the dimensionality of our network, getting averages

Mike Heavers 6 years ago

Each further convolution makes assumptions not upon the original images, but upo

Mike Heavers 6 years ago

We can view the filters being used in the CNN here.

Mike Heavers 6 years ago

Local response normalization refers to the ability of an excited neuron to subdu

Mike Heavers 6 years ago

Fully connected layer - where the high level reasoning is done. Tunes the weight

Mike Heavers 6 years ago

need to link to google drive

Mike Heavers 6 years ago (edited)

TODO: Add examples

01-Visual Machine Learning
OVERVIEW
Image Classification
Image Regression
Style Transfer
Current State of Visual Machine Learning
MACHINE LEARNING WITH IMAGES
CNNs
Pooling
Feature Extraction
How Neural Networks Understand Images
Convnet Viewer
Looking at Convolutions
Synthesizing Images to Identify Specific Categories
Deep Dream
Why so many puppy slugs?
Image Classification
When might this be used?
Using a pre-built image classifier
Training our first model
Video Classifier
Pre-trained webcam classifier
Image Regressor
Style Transfer
What is a style?
Why is ML so processor intensive?
Style Transfer Examples
Basic Style Transfer
Applying one style to many images
Applying a style to a sequence of images / animated GIF
Style Transfer for Video
Saving the Results
Creating your own style
Process
Tuning the Results
Style Transfer References
Current Examples of Machine Learning for Images/Video
SFV - Skills from Videos
CycleGAN
Style Transfer For-Ev-Er
FOR NEXT CLASS

​​OVERVIEW

​​MACHINE LEARNING WITH IMAGES

​​CNNs

​​How Neural Networks Understand Images

​​Convnet Viewer

​​Image Classification

​​Style Transfer

​​Style Transfer Examples

​​Creating your own style

​​Current Examples of Machine Learning for Images/Video

​​FOR NEXT CLASS

OVERVIEW

MACHINE LEARNING WITH IMAGES

CNNs

How Neural Networks Understand Images

Convnet Viewer

Image Classification

Style Transfer

Style Transfer Examples

Creating your own style

Current Examples of Machine Learning for Images/Video

FOR NEXT CLASS