added to READMEs
This commit is contained in:
@@ -3,6 +3,8 @@ Unzip German FoodBERT models here!
|
|||||||
|
|
||||||
They can be found under https://cloud.marquis.site/s/ZUVIIIQv6yznBj6
|
They can be found under https://cloud.marquis.site/s/ZUVIIIQv6yznBj6
|
||||||
|
|
||||||
|
For each version, the actual model is located in "output". "eval" contains the generated substitute recommendations for each version.
|
||||||
|
|
||||||
|
|
||||||
## Datasets
|
## Datasets
|
||||||
Each model has a folder "dataset" with the following files:
|
Each model has a folder "dataset" with the following files:
|
||||||
|
|||||||
@@ -7,6 +7,12 @@ To create vocab.txt file, run **make_new_vocab.py**
|
|||||||
# train German FoodBERT
|
# train German FoodBERT
|
||||||
**language_modeling**
|
**language_modeling**
|
||||||
|
|
||||||
|
This was executed on Google Colab with the following parameters:
|
||||||
|
|
||||||
|
!python /content/drive/MyDrive/masterarbeit/language_modeling.py --output_dir="/content/drive/MyDrive/masterarbeit/output" --model_type=bert --model_name=bert-base-german-cased --do_train --train_data_file="/content/drive/MyDrive/masterarbeit/data/training_data.txt" --do_eval --eval_data_file="/content/drive/MyDrive/masterarbeit/data/testing_data.txt" --mlm --line_by_line --per_device_train_batch_size=8 --gradient_accumulation_steps=2 --per_device_eval_batch_size=8 --save_total_limit=5 --save_steps=10000 --logging_steps=10000 --evaluation_strategy=epoch --model_name_or_path="bert-base-german-cased"
|
||||||
|
|
||||||
|
The exclamation mark at the beginning of the line is only needed on Google Colab and can be omitted when executing locally. Paths need to be adjusted when executing.
|
||||||
|
|
||||||
|
|
||||||
# Vocab Files:
|
# Vocab Files:
|
||||||
**bert-base-german-cased_tokenizer.json**: original bert-base-german-cased tokenizer file
|
**bert-base-german-cased_tokenizer.json**: original bert-base-german-cased tokenizer file
|
||||||
|
|||||||
Reference in New Issue
Block a user