GSoC 2020 - Project Proposal

About me

My name is Rasul and I’m the MSc student at the Skoltech, studying Data Science/Math. I’m also currently the assistant computer vision engineer in Samsung AI in Moscow. I took a number courses on Bayesian inference during my bachelor and master degrees, so I would like to choose the NumFocus sub-organization with relevant ideas for GSoC, to expand my knowledge and be the part of the growing community.

I’ve been following the pymc-devs/tfp projects for quite some time and I would love to help with the development of pymc4 based on Tensorflow Probability. I hope that I have required skills for the project to be completed. I’ve invested some time to understand the code of tensorflow probability/pymc3/pymc4 and communicate with development community extensively. I love the Bayesian field, and it will be, I hope, not very hard to learn new theoretical properties for the relevant methods that I am planning to implement during this summer.

Project - MCMC support (more samplers support)

The project is to add more comprehensive support for mcmc sampler methods in PyMC4. Currently there is no support for discrete distributions in model, support for gradient free samplers, and etc. The goal of the project would be to expand the support, add various samplers, including SMC. Also, from the design perspective there is a need for compound step/Gibbs sampling. We need to collaborate with the Tensorflow Probability team developers to implement the most effective solution.

For now, I believe, the starting point of the project should be the support for basic samplers that are implemented in tfp. For that, I will include my ideas on design here, so it can be discussed and improved. Then, moving forward, there is a need in interface design for step methods and samplers with conditional probabilities like Gibbs. I'm not sure if it is feasible with the over-reliance on batching that tfp has but it is really interesting to discuss this issue further. Also, user experience should be improved heavily starting with a progress bar to the trace plotting. I see the project including the following features to be added:

  • Support for more samplers, tricky because of multimodal/discrete distribution support.
  • Support for sampler method assigner as in pymc3
  • Supporting optimized samplers for various samplers, i.e. BinaryMetropolis, etc. (maybe should be implemented in tfp)
  • Add support for SMC
  • Fix all the issues with discrete distribution sampling, design more user friendly interface. Additionally, fix issues with xla.
  • Support for step methods CompoundStep and Gibbs. But should be discussed if there is a need for that with current design.

Ideas for design

To support multiple samplers I propose to refactor code to have the base abstract class for all sampler subclasses:

class _BaseSampler(ABC):
  # kwars include all the key-args for kernel, adaptive_kernel (if exist), sampler_chain func
  def __init__(self, model, **kwargs): 
    self.model = model
    self._assign_kwargs()    
    self._check_kwargs()
  def sample(self, *, num_samples, ...): # all sampling arguments (num_chains, observed and so on)
    """
      Samling logic is defined here, for now the logic is the same for all subclasses
    """
  @abc.abstractmethod  
  def _trace_func(self):
    # defined in subclass    
    ...
  def _run_chains(self):
    """
      All the logic with kernel, adaptive kernel, and sampler func is implemented here
    """ 
  # other logic
  ...   

Then I think we need class constructor for supporting the logic for samplers like CompoundStep, maybe it is not required so it should be discusses. I’m still thinking on how the Gibbs/CompoundStep can be implemented so I guess it is the question for now.

class SamplerConstr: # naming is bad, I know
  def __new__(cls, *args, **kwargs):
    # check for all the issues with args, kwargs
    # check for discrete variables if samplers calculates grad
    if cls._name == "compound":
      sample_methods = kwargs.get("sample_methods")
      sampler_list = _assign_samplers(sample_methods)
      instances = []