initial commit of project

This commit is contained in:
2021-04-11 19:51:12 +02:00
commit a21a8186d9
110 changed files with 16326178 additions and 0 deletions

12
clean_dataset/README.md Normal file
View File

@@ -0,0 +1,12 @@
# Prepare Dataset for Training
## Requirements
install spacy model:
python -m spacy download de_core_news_lg
## Steps
1. combine_craped_dataset_parts: Combine the parts of the dataset if needed.
Adapt Paths and number of dataset parts!
2. dataset_helpers: Clean ingredients and use only ingredients that are used more than 20 times.
3. dataset_instructions_helpers: Clean ingredients in steps. Separate sentences in steps.