Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Profile picture
FantasticSeaurchin8
Moderator
3 Questions, 26 Answers
  Active since 02 August 2024
  Last activity 27 days ago

Reputation

0

Badges 1

25 × Eureka!
0 Votes
13 Answers
168 Views
0 Votes 13 Answers 168 Views
one month ago
0 Votes
17 Answers
133 Views
0 Votes 17 Answers 133 Views
Hi all Im trying to save my model checkpoints during runtime but am running into a confusing snag. I'm using the HuggingFace architecture for a transformer. ...
one month ago
0 Votes
2 Answers
83 Views
0 Votes 2 Answers 83 Views
Was there ever a solution to this request? https://faq.clear.ml/question/1546665636485140480/is-there-any-way-to-change-the-x-[…]s-to-say-e-g-epochs-instead-...
one month ago
0 Hi Everyone, I'M Having Trouble Setting My Output_Uri Such That My Model Checkpoints Are Saved Outside Of The Venv And Accessible Via Id Or For Download. Im Running Clear-Ml On A Remote Server Through Docker And I Believe Clearml Is Unable To Resolve The

docker compose on server:


  apiserver:
    command:
    - apiserver
    container_name: clearml-apiserver
    image: allegroai/clearml:latest
    privileged: true
    restart: unless-stopped
    volumes:
    - ${LOGS_DIR}:/var/log/clearml
    - /opt/clearml/config:/opt/clearml/config
    - ${FILESERVER_DATA_DIR}:/mnt/fileserver
    depends_on:
      - redis
      - mongo
      - elasticsearch
      - fileserver
    environment:
      CLEARML_ELASTIC_SERVICE_HOST: elasticsearch
      C...
one month ago
0 Hi Everyone, I'M Having Trouble Setting My Output_Uri Such That My Model Checkpoints Are Saved Outside Of The Venv And Accessible Via Id Or For Download. Im Running Clear-Ml On A Remote Server Through Docker And I Believe Clearml Is Unable To Resolve The

which logs are helpful? console output, fileserver, or api?

my main issue is that i can see that the model artefacts are here file:///root/.clearml/venvs-builds/3.11/task_repository/my_awesome_facility_model/checkpoint-33/scheduler.pt
which i believe is not persistent/retrievable with an artefact id

one month ago
0 Hi Everyone, I'M Having Trouble Setting My Output_Uri Such That My Model Checkpoints Are Saved Outside Of The Venv And Accessible Via Id Or For Download. Im Running Clear-Ml On A Remote Server Through Docker And I Believe Clearml Is Unable To Resolve The

this is how im initializing before calling my training function. this is inside my training_script:

task = Task.init(project_name=project_name, task_name=task_name, output_uri=True) 
one month ago
0 Hi Everyone, I'M Having Trouble Setting My Output_Uri Such That My Model Checkpoints Are Saved Outside Of The Venv And Accessible Via Id Or For Download. Im Running Clear-Ml On A Remote Server Through Docker And I Believe Clearml Is Unable To Resolve The

This issue was solved by adding task.output_uri = "fileserver" in the scheduling script, but for some reason this does not work when setting in task.create call in the same script but needs to be set after. it also doesnt work when being set in the training script, so there must have been some unknown overriding

one month ago
0 Hi Everyone, I'M Having Trouble Setting My Output_Uri Such That My Model Checkpoints Are Saved Outside Of The Venv And Accessible Via Id Or For Download. Im Running Clear-Ml On A Remote Server Through Docker And I Believe Clearml Is Unable To Resolve The

local conf

api {
    # Notice: 'host' is the api server (default port 8008), not the web server.
    api_server: 

    web_server: 

    files_server: 

    # Credentials are generated using the webapp, 

    # Override with os environment: CLEARML_API_ACCESS_KEY / CLEARML_API_SECRET_KEY
    credentials {"access_key": "somekey", "secret_key": "somekey"}
}

 # Default Task output_uri. if output_uri is not provided to Task.init, default_outp...
one month ago
0 Hi Everyone, I'M Having Trouble Setting My Output_Uri Such That My Model Checkpoints Are Saved Outside Of The Venv And Accessible Via Id Or For Download. Im Running Clear-Ml On A Remote Server Through Docker And I Believe Clearml Is Unable To Resolve The

way im attempting to access with an id

cl_model_id = config.get("model_id")
    #model = Model(model_id=cl_model_id)
    model = InputModel(model_id=cl_model_id)
    checkpoint = model.get_local_copy()
one month ago
0 Hi All Im Trying To Save My Model Checkpoints During Runtime But Am Running Into A Confusing Snag. I'M Using The Huggingface Architecture For A Transformer. Using Their Training Module To Control Training. In The Training Args, I Have The

console output:

clearml.Task - INFO - Completed model upload to file_server/training.e5f99149b9b/models/optimizer.pt
clearml.Task - INFO - Completed model upload to file_server/training.e5f99149b9b/models/scheduler.pt
clearml.Task - INFO - Completed model upload to file_server/training.e5f99149b9b/models/rng_state.pth
save_model
somemodel/checkpoint-198
clearml.Task - INFO - Completed model upload to file_server/training.e5f99149b9b/models/training_args.bin
one month ago
0 Hi All Im Trying To Save My Model Checkpoints During Runtime But Am Running Into A Confusing Snag. I'M Using The Huggingface Architecture For A Transformer. Using Their Training Module To Control Training. In The Training Args, I Have The

called with:


task = Task.init(
    project_name=project_name, task_name=task_name, output_uri="fileserver_address"
)


task.connect(config)


checkpoint = config.get("model_path")

image_processor = AutoImageProcessor.from_pretrained(
        checkpoint,
        num_labels=config.get("class_number"),
    )

best_model = training(checkpoint, image_processor)
one month ago
0 Hi All Im Trying To Save My Model Checkpoints During Runtime But Am Running Into A Confusing Snag. I'M Using The Huggingface Architecture For A Transformer. Using Their Training Module To Control Training. In The Training Args, I Have The

queued with:

task = Task.create(
    project_name="name",
    task_name="training",
    repo="repo",
    branch="branch",
    script="training_script",
    packages=package_list,
    docker="docker_gpu_image",
    docker_args=["--network=host"],
)
task.output_uri = "filer_server"
task.enqueue(task, "training")
one month ago
0 Hi All Im Trying To Save My Model Checkpoints During Runtime But Am Running Into A Confusing Snag. I'M Using The Huggingface Architecture For A Transformer. Using Their Training Module To Control Training. In The Training Args, I Have The

training function:

def training(checkpoint, image_processor):

    data_test_train, labels, label_to_id, id_to_label = pre_process()

    model = AutoModelForImageClassification.from_pretrained(
        checkpoint,
        num_labels=len(labels),
        id2label=id_to_label,
        label2id=label_to_id,
        ),
    )
    def metrics(eval_pred):
        metric_val = config.get("eval_metric")
        metric = evaluate.load(metric_val)
        predictions, labels = eval_pred
        ...
one month ago
0 Hi All Im Trying To Save My Model Checkpoints During Runtime But Am Running Into A Confusing Snag. I'M Using The Huggingface Architecture For A Transformer. Using Their Training Module To Control Training. In The Training Args, I Have The

it would seem they are related but i cant see the further details of this bug. Either doing a manual artefact upload with task or turning tensor board tracking off in the hugging face trainer both seemed to enable json tracking within the checkpoints. But I would have thought the tensorboard behavior wasnt desired.

28 days ago
0 Hi All Im Trying To Save My Model Checkpoints During Runtime But Am Running Into A Confusing Snag. I'M Using The Huggingface Architecture For A Transformer. Using Their Training Module To Control Training. In The Training Args, I Have The

So we have managed to get whole checkpoint files to save by removing the save_total_limit from training, this seems to save checkpoint folders with all files in it. however now we have a ballooning server.

did discover this None
and wondering if there's some nuance in autotracking that needs to be circumvented

one month ago
0 Hi All Im Trying To Save My Model Checkpoints During Runtime But Am Running Into A Confusing Snag. I'M Using The Huggingface Architecture For A Transformer. Using Their Training Module To Control Training. In The Training Args, I Have The

so turning report_to="tensorboard", off seemed to solve the issue...as in the training run saves checkpoints as you would expect. That doesnt seem like desired behavior..

one month ago
0 Was There Ever A Solution To This Request?

UI, in the dashboard. I know I could create my own custom plot and track it but it seems odd not to have epoch as configurable option

28 days ago
0 Hi Everyone, I'M Having Trouble Setting My Output_Uri Such That My Model Checkpoints Are Saved Outside Of The Venv And Accessible Via Id Or For Download. Im Running Clear-Ml On A Remote Server Through Docker And I Believe Clearml Is Unable To Resolve The

sure!

So this is how im queuing the job:

task = Task.create(
    project_name="multiclass-classifier",
    task_name="training",
    repo="reponame_url",
    branch="branch_name",
    script='training_script_name',
    packages=package_list,
    docker="python:3.11",
    docker_args="--privileged"

)
task.enqueue(task, "services")  # services queue is the one with a remote worker
one month ago
0 Hi All, I Wanted To Know About Saving Datasets, We Want To Specify The Path To Gs By Default, As I Understand By Default It Uses The Path To File_Server? We Tried Sdk.Development.Default_Output_Uri =

I ran into trouble with this, i found for saving data you need to have it specified in the conf. even though as far as im aware setting it as par tof a task is supposed to overwrite this.

further i found that the server wasnt able to resolve itself as a destination without providing an alias to the server name in the server side docker.

finally when it comes to saving artifacts it seems this had to be set in task.output_uri and not in the create or init :man-shrugging:

one month ago