Website
The Website Connector allows you to ingest public content under a specific web address.
Setting Up the Website Connector
Step 1: Select Website from Data Sources Library
Navigate to the Data Sources section of your project. Click “Add data source” and select Website.
Step 2: Configure the connector
In the Website address field type in the URL. By default every page under that address will be crawled and ingested in your data source.
Step 3: Advanced settings
In advanced settings you can limit the ingestion to specific pages allowing you to optimize your embedding cost.
You can specify number of pages to be ingested under the web address. The maximum and by default value is 10 000.
You can add or exclude specific URLs. URLs will be added to or excluded from the number of pages you selected.
You can ingest URLs that contain a specific phrase.
You can enable or disable ingesting of external links found in the website content.
Step 4: View Ingestion Status
You can view the current ingestion status by clicking on the data source again. In the detailed list you will see all ingested pages as specific URLs.
Step 5: Ready to Use
After the data has been ingested successfully, the data source is now ready to be used in an Agent.