Data Training
The Data Training tab is where you upload and prepare content that the chatbot will use to generate accurate and helpful responses.
This tab supports three types of data ingestion:
- Upload Files
- Import via Sitemap with Regex
- Enter Direct URLs
Upload Files
You can upload text and document files such as:
- PDF, TXT, CSV, DOCX, and PPTX
- Limit: Maximum 30MB in total; each individual file must not exceed 15MB
Steps to Upload
- Click Choose Files and select your document(s)
- Click Upload
- Once uploaded, files will appear below the uploaded field
- You can:
- Click the ❌ cross icon to remove files before submission
- Click Submit to train them
Note: If your uploaded file doesn’t appear under Uploaded Data, it means the training failed or upload was incomplete — try uploading again.
Advanced Data Extraction via Sitemap
This section allows importing structured web content using a sitemap and optional regular expression filter.
Fields
- Sitemap: Enter your site’s sitemap URL (e.g.,
https://domain.com
) - Regular Expression: (Optional) Filter specific URLs (e.g.,
/travel
,release/2023
)
Actions
Click Preview to review fetched content
- A Preview Modal opens where you can:
- Toggle Auto Crawl (on = periodic auto-refresh; off = static data)
- Click Cancel to exit or Train to begin processing
Enter Comma-Separated URLs
Quickly train your bot using a list of specific page URLs.
Example Format:
(e.g., https://abc.com
, https://abc.com/travel
, https://xyz.com/test
)
-
Paste multiple URLs separated by commas
-
Click Preview to review fetched content
-
A Preview Modal opens where you can:
- Toggle Auto Crawl (on = periodic auto-refresh; off = static data)
- Click Cancel to exit or Train to begin processing
Training Flow Logic
- After you Submit or Train, the data enters a Queued state
- If the training is successful, the content will move to the Uploaded Data tab
- If you do not see the content in the Uploaded tab, the training failed — retry the upload or check formatting
Key Concepts
-
✅ Uploaded Data: These are the only documents and URLs the chatbot uses to answer questions.
-
🕓 Queued Data: Content that is still being processed.
Note: Queued content does not affect chatbot responses until it appears in Uploaded Data.
-
⚙️ Auto Crawl Toggle:
When enabled, your chatbot will periodically re-fetch and retrain on the URL’s content.
When disabled, it stays fixed and won’t update unless retrained manually.
Best Practices
- Always verify file size and format before uploading.
- Use clear and crawlable web pages for best URL training results.
- Regularly monitor Queued Data and retry failed uploads if needed.
- Keep Auto Crawl enabled only for content that changes frequently.