With Hugging Face
This page outlines the steps involved in using the Local LLM integration.
This feature is in the experimental phase and might contain bugs. Your bug reports, sent to [email protected], are essential for enhancing its stability.
Prior to submitting bug reports, please ensure that your system meets the requirements outlined in the Installation section.
Go to the
Servertab.Start the server by clicking the
Start serverbutton. The initial launch may take some time, so please wait until the messageServer is running on port <PORT>appears. You can view the server status, including thePIDof the running process, at the bottom of the view.
The local server powers the local LLM capabilities of BurpGPT Pro, and all computations are made locally, ensuring complete data privacy of your prompts and HTTP traffic.
In scenarios with restricted system PATH access, manually providing the Python executable's absolute path in the designated
Python pathfield ensures the local server's initiation. If left blank, the system PATH will be used for automatic Python binary detection.Switch to the
Local LLMtab and select one of the pre-built models from theModeldropdown field. The associated number of datapoints used to train the model is displayed under theModel sizefield.
When selecting certain models from the Hugging Face hub, such as meta-llama/Meta-Llama-3.1-8B, you might encounter the following error:
Failed to load model and tokenizer: You are trying to access a gated repository. Make sure you have access to it at https://huggingface.co/<MODEL>.
If this happens, follow these steps:
Request Access: Complete the COMMUNITY LICENSE AGREEMENT form located on the model's repository page. You may need to agree to share your contact information.
Authenticate: Log in with your Hugging Face account by following the instructions at Hugging Face CLI Login.
Retry: After completing the above steps, attempt to load the model again.
Keep in mind that the larger the number of datapoints used to train a model, the larger the resulting model size will be. In some cases, the model size can be in the gigabytes range, which may impact processing time for your queries.
When selecting models on the Hugging Face hub, it is recommended to choose instruct models, typically suffixed with it or instruct. These models work best with BurpGPT Pro. The built-in list includes examples from models provided by Google, Meta, Microsoft, and the OpenAI Community.
To optimise the performance of your local model, set the
Max prompt lengthandMax token lengthparameters appropriately. By adjusting these parameters, you can optimise the amount of information you can provide to the model and achieve the desired length of the response.Max prompt length: determines the maximum size of your prompt once the placeholders have been replaced.Max token length: specifies the maximum length allowed for both the prompt and the model response. This variable depends on the model type and technology. For instance, GPT-2-based models usually have a max token length of 1,024, while GPT-3-based models have a larger value of 2,048.
Last updated