Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add explanation for Windows user to how to Create EXE files #419

Open
fabiomatricardi opened this issue May 15, 2024 Discussed in #418 · 4 comments
Open

Add explanation for Windows user to how to Create EXE files #419

fabiomatricardi opened this issue May 15, 2024 Discussed in #418 · 4 comments

Comments

@fabiomatricardi
Copy link

Discussed in #418

Originally posted by fabiomatricardi May 15, 2024
Ciao,
I tried to ask in the Discord channel but I get no replies... so after 1 week of struggles I understood how to do it.
I would like this to be in the main page of the Repo

How to create .exe files in Windows

  • download a GGUF smaller than 4Gb (in my example qwen1_5-0_5b-chat-q8_0.gguf, from official Qwen repo: it has already chat template and tokenizer included in the GGUF)
  • download the zip file for llamafile latest release here and unzip in the same folder of the GGUF
  • rename the extension to .exe

  • download zipalign from here and unzip it in the same folder
  • rename the extension to .exe

In my case i want the executable to run the API server with few more arguments (context length)

Create a .arg file as explained in Creating Llamafiles

the file will contain

-m
qwen1_5-0_5b-chat-q8_0.gguf
--host
0.0.0.0
-c
12000
...

in the terminal run the following to have the base binary

copy .\llamafile-0.8.4.exe qwen1_5-0_5b-chat-q8.llamafile

Then club together with zipalign the llamafile, the GGUF file and the arguments

.\zipalign.exe -j0 qwen1_5-0_5b-chat-q8.llamafile qwen1_5-0_5b-chat-q8_0.gguf .args

Finally rename the .llamafile into .exe

ren qwen1_5-0_5b-chat-q8.llamafile qwen1_5-0_5b-chat-q8.exe

Run the Qwen model

from the terminal run

.\qwen1_5-0_5b-chat-q8.exe --nobrowser

This will load the model and start the webserver without opening the browser.

Python API call

from openai import OpenAI
import sys

# Point to the local server
client = OpenAI(base_url="http://localhost:8080/v1", api_key="not-needed")
history = [
    {"role": "system", "content": "You are QWEN05, an intelligent assistant. You always provide well-reasoned answers that are both correct and helpful. Always reply in the language of the instructions."},
    {"role": "user", "content": "Hello, introduce yourself to someone opening this program for the first time. Be concise."},
]
print("\033[92;1m")
while True:
    userinput = ""
    completion = client.chat.completions.create(
        model="local-model", # this field is currently unused
        messages=history,
        temperature=0.3,
        frequency_penalty  = 1.4,
        max_tokens = 600,
        stream=True,
    )

    new_message = {"role": "assistant", "content": ""}
    
    for chunk in completion:
        if chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="", flush=True)
            new_message["content"] += chunk.choices[0].delta.content

    history.append(new_message)

    print("\033[1;30m")  #dark grey
    print("Enter your text (end input with Ctrl+D on Unix or Ctrl+Z on Windows) - type quit! to exit the chatroom:")
    print("\033[91;1m")  #red
    lines = sys.stdin.readlines()
    for line in lines:
        userinput += line + "\n"
    if "quit!" in lines[0].lower():
        print("\033[0mBYE BYE!")
        break
    history = [
            {"role": "system", "content": "You are an intelligent assistant. You always provide well-reasoned answers that are both correct and helpful."},
            ]
    history.append({"role": "user", "content": userinput})
    print("\033[92;1m")

Accepting multi line entries in the input: when finished Ctrl+Z and Enter

To exit type quit! and Ctrl+Z and Enter

@mofosyne
Copy link
Collaborator

Are you able to make a PR proposal with your proposed changes so it can be reviewed and potentially merged in?

@fabiomatricardi
Copy link
Author

sure... how?

@mofosyne
Copy link
Collaborator

@jart is this actually more suitable for a wiki instead? If so then could you enable it? This might be more of a freeform doc than a formal instruction.

Otherwise, would it make more sense for him to make a new folder /docs/ in the repo?

@fabiomatricardi how experience are you to making github contributions? If not very experienced then we could try to accommodate, but do recommend you learn a bit how to use github Pull Request so your contributions can be more easily tracked.

@fabiomatricardi
Copy link
Author

fabiomatricardi commented May 23, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants