pre-commit Code style: black


libNLP is a proprietary natural language processing (understanding) library developed at Squirro and powered by machine learning.

Documentation is hosted at

Install using pip:

pip install .


When we add new classes/steps, please add them to pydocmd.yml file also. In future, we will migrate it to


libNLP is structured as a pipeline where a user can specify a sequence of steps to load and transform unstructured data to then be classified, clustered, etc, and then ultimately saved either to disk (CSV or JSON format) or in Squirro. The results of the libNLP pipeline can then be screened for quality using provided analyzers.

The pipeline configuration is specified in JSON format. For example, to train a model on the canonical Iris flower data set, we can use the following:

  "dataset": {
    "train": "data/train",
    "test": "data/test"
  "analyzer": {
    "type": "classification",
    "tag_field": "pred_class",
    "label_field": "class"
  "pipeline": [{
    "step": "loader",
    "type": "csv",
    "fields": ["sepal length", "sepal width", "petal length", "petal width",
    "step": "classifier",
    "type": "sklearn",
    "input_fields": ["sepal length", "sepal width", "petal length", "petal width"],
    "label_field": "class",
    "model_type": "SVC",
    "model_kwargs": {"probability": true},
    "output_field": "pred_class",
    "explanation_field": "explanation"

This as well as other simple workflows can be found in the examples directory.


pip install -e .[test]
pytest --cov-report term-missing --cov=squirro.lib.nlp