The counting grid captures high order statistics in bag of words data. As opposed to capturing the co-occurrence statistics with a handful of topics a in LDA, it uses a large (e.g. 64X64) grid of micro-topics estimated so that the individual bags of words can be generated using the micro-topics from overlapping windows (e.g. 5X5). In this way, the model avoids explicitly constraining the topics to avoid over-fitting, which is instead kept in check by controlling the window size relative to the grid size. Following maximum likelihood training, the topics naturally shrink to be considerably less entropic than LDA topics, and are further naturally arranged so that related concepts are mapped nearby leading to a variety of application in vision, NLP, biology and UI/visualization design.