Essential guide to binning
25/May 2012
Often I found myself fighting against data binning, trying to understand the relation between linear and logarithmic bins and how to create the bin starting from the bins number or the bins spacing.
It’s time to write down some consideration and snippet!
To be updated…
Linear vs logarithmic
I live in a linear space. My advisor and a lot of other scientists live in a logarithmic space. It’s quite difficult to easily communicate, but trying to “mask” this difference life can be more peaceful.
Hereafter I would like to thing about “equally spaced bins”. It’s not important if they are linearly or logarithmically equally spaced because you can take the same snippet of code and pass to it a logarithmic array, or logarithmic boundaries.
From one to the other
Suppose you have an array. How much will be the bin spacing to obtain n_bin
bins?
It can be easily computed as
$\delta_{bin} = (sup-inf)/n_{bin}$
From this it’s also straightforward to obtain the number of bins from the spacing:
$n_{bin} = \lfloor(sup-inf)/\delta_{bin}\rfloor$
Note that we choose the number of bins to be integer.
Code
Here some code to bin your arrays:
#!/usr/bin/env python
import sys
import numpy as np
def binning(inf, sup, n_bin=None, delta_bin=None):
"""Given the inf and sup limits of an array and the number of equally spaced
bins, it returns the bin centers, the bin limits and the bin spacing.
It's possible to have a linear or a logarithmic spacing passing linear or
logarithmic inf and sup, and searchsorting on a linear or logarithmic array,
or you can use a linear array and logarithmically spaced bins as
new_bins = pow(10, logbins)
"""
if (n_bin == 0) or (n_bin == 0):
print "Error, n_bin and/or delta_bin are/is zero, exit!"
sys.exit()
elif (n_bin == None) and (delta_bin != None):
n_bin = (sup-inf)/delta_bin
elif (n_bin == None) and (delta_bin == None):
print "Error, n_bin and delta_bin are both None, exit!"
sys.exit()
temp, half_step = np.linspace(inf, sup, 2*n_bin+1, endpoint = True, retstep = True)
xrange_limit = int(np.floor(temp.size / 2))
bin_pos = np.zeros(xrange_limit)
bin_limits = np.zeros(xrange_limit+1)
for i in xrange(xrange_limit):
bin_pos[i] = temp[2*i+1]
bin_limits[i] = temp[2*i]
bin_limits[-1] = temp[-1]
del temp
return [bin_pos, bin_limits, 2*half_step]
def base_binning(inf, sup, n_bin=None, delta_bin=None):
"""More C-like...
"""
if (n_bin == 0) or (n_bin == 0):
print "Error, n_bin and/or delta_bin are/is zero, exit!"
sys.exit()
elif (n_bin == None) and (delta_bin != None):
n_bin = int((sup-inf)/(1.*delta_bin))
elif (n_bin != None) and (delta_bin == None):
delta_bin = (sup-inf)/(1.*n_bin)
elif (n_bin == None) and (delta_bin == None):
print "Error, n_bin and delta_bin are both None, exit!"
sys.exit()
bin_pos = np.zeros(n_bin)
bin_limits = np.zeros(n_bin+1)
for i in range(n_bin):
if i%2 == 0:
bin_limits[i] = inf + i * delta_bin
bin_limits[i+1] = bin_limits[i] + delta_bin
bin_pos[i] = bin_limits[i] + delta_bin/2.
bin_limits[n_bin] = inf + n_bin * delta_bin
return [bin_pos, bin_limits, delta_bin]