pre-commit Code style: black

libNLP

libNLP is a proprietary natural language processing (understanding) library developed at Squirro and powered by machine learning.

Documentation is hosted at https://squirro.github.io/nlp/.

Install using pip:

pip install .

Contributing

When we add new classes/steps, please add them to pydocmd.yml file also. In future, we will migrate it to docs.squirro.com

Overview

libNLP is structured as a pipeline where a user can specify a sequence of steps to load and transform unstructured data to then be classified, clustered, etc, and then ultimately saved either to disk (CSV or JSON format) or in Squirro. The results of the libNLP pipeline can then be screened for quality using provided analyzers.

The pipeline configuration is specified in JSON format. For example, to train a model on the canonical Iris flower data set, we can use the following:

{
  "dataset": {
    "train": "data/train",
    "test": "data/test"
  },
  "analyzer": {
    "type": "classification",
    "tag_field": "pred_class",
    "label_field": "class"
  },
  "pipeline": [{
    "step": "loader",
    "type": "csv",
    "fields": ["sepal length", "sepal width", "petal length", "petal width",
               "class"]
  },{
    "step": "classifier",
    "type": "sklearn",
    "input_fields": ["sepal length", "sepal width", "petal length", "petal width"],
    "label_field": "class",
    "model_type": "SVC",
    "model_kwargs": {"probability": true},
    "output_field": "pred_class",
    "explanation_field": "explanation"
  }]
}

This as well as other simple workflows can be found in the examples directory.

Testing

pip install -e .[test]
pytest --cov-report term-missing --cov=squirro.lib.nlp