Yi Liu, 24 September 2019
Home page: https://2019.pyconuk.org
PyCon UK
https://pretalx.com/pyconuk-2019/schedule/
https://github.com/grochmal/nnag
numpy
and autograd
to disseminate inner workings of pytorch
neural net moduleshttps://github.com/bonzanini/topic-modelling
https://github.com/janfreyberg/pycon-tutorial-2019
python3 type annotation (Note: This is my code)
def get_encode(
text_list: List[str], model_name: str, url: str
) -> List[List[float]]:
payload = {"text_list": text_list, "model_name": model_name}
r = requests.post(f"{url}/encode", data=json.dumps(payload))
res = r.json()["embeddings"]
return res
mypy
# A function with wrong types
def foobar(x: str) -> str:
y = x + 1
return 1
# Static type analysis with mypy
mypy $(find . | grep "\.py$")
funcs/funcs.py:42: error: Unsupported operand types for + ("str" and "int")
funcs/funcs.py:43: error: Incompatible return value type (got "int", expected "str")
# in:
def very_important_function(template: str, *variables, file: os.PathLike, engine: str, header: bool = True, debug: bool = False):
"""Applies `variables` to the `template` and writes to `file`."""
with open(file, 'w') as f:
...
# out:
def very_important_function(
template: str,
*variables,
file: os.PathLike,
engine: str,
header: bool = True,
debug: bool = False,
):
"""Applies `variables` to the `template` and writes to `file`."""
with open(file, "w") as f:
...
Code styles aren't black and white.
They should all be black. :)
Over its history NLP has progressed from pure linguistic models to statistical/machine learning models.
word2vec(Mikolov et al., 2013) is among the first batch of models that uses neural networks to encode texts into high-dimensional embedding vectors.
Neural network models start to show advantages over other methods due to its advantages in learning unstructured information.
The current state-of-the-art models are "transformer" models
A two-stage approach:
Transformer-related NLP ecosystems are starting to mature in recent months for us to use!
The medium-long-term goal is to use text embeddings generated from various state-of-the-art algorithms and models for various downstream tasks.
The current efforts are still on building infrastructures and understanding various frameworks and ecosystems.
import requests
url = "http://jojo.epi.bris.ac.uk:8560"
text_1 = "I love cats"
text_2 = "I hate cats"
model_name = "biobert_v1.1_pubmed"
payload = {
"text_1": text_1,
"text_2": text_2,
"model_name": model_name,
}
r = requests.get(f"{url}/cosine_similarity", params=payload)
# text_1 = "I love cats"
# text_2 = "I hate cats"
res = r.json()
res
{'cosine_similarity': 0.9612499346597072}