15 lines
500 B
Markdown
15 lines
500 B
Markdown
# Vocab
|
|
To create vocab.txt file, run **make_new_vocab.py**
|
|
|
|
# Prep dataset
|
|
**prep_dataset_training**: Format and split dataset, so it can be used for training. Adapt which dataset version to make!
|
|
|
|
# train German FoodBERT
|
|
**language_modeling**
|
|
|
|
|
|
# Vocab Files:
|
|
**bert-base-german-cased_tokenizer.json**: original bert-base-german-cased tokenizer file
|
|
**bert_vocab.txt**: original bert-base-german-cased vocab
|
|
**used_ingredients**: all ingredients in dataset
|
|
**vocab.txt**: German FoodBERT vocabulary |