Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Profile picture
FantasticSeaurchin8
Moderator
3 Questions, 26 Answers
  Active since 02 August 2024
  Last activity 4 months ago

Reputation

0

Badges 1

25 × Eureka!
0 Votes
2 Answers
234 Views
0 Votes 2 Answers 234 Views
Was there ever a solution to this request? https://faq.clear.ml/question/1546665636485140480/is-there-any-way-to-change-the-x-[…]s-to-say-e-g-epochs-instead-...
4 months ago
0 Votes
17 Answers
309 Views
0 Votes 17 Answers 309 Views
Hi all Im trying to save my model checkpoints during runtime but am running into a confusing snag. I'm using the HuggingFace architecture for a transformer. ...
4 months ago
0 Votes
13 Answers
379 Views
0 Votes 13 Answers 379 Views
4 months ago
0 Hi All Im Trying To Save My Model Checkpoints During Runtime But Am Running Into A Confusing Snag. I'M Using The Huggingface Architecture For A Transformer. Using Their Training Module To Control Training. In The Training Args, I Have The

it would seem they are related but i cant see the further details of this bug. Either doing a manual artefact upload with task or turning tensor board tracking off in the hugging face trainer both seemed to enable json tracking within the checkpoints. But I would have thought the tensorboard behavior wasnt desired.

4 months ago
0 Hi All Im Trying To Save My Model Checkpoints During Runtime But Am Running Into A Confusing Snag. I'M Using The Huggingface Architecture For A Transformer. Using Their Training Module To Control Training. In The Training Args, I Have The

so turning report_to="tensorboard", off seemed to solve the issue...as in the training run saves checkpoints as you would expect. That doesnt seem like desired behavior..

4 months ago
0 Hi All, I Wanted To Know About Saving Datasets, We Want To Specify The Path To Gs By Default, As I Understand By Default It Uses The Path To File_Server? We Tried Sdk.Development.Default_Output_Uri =

I ran into trouble with this, i found for saving data you need to have it specified in the conf. even though as far as im aware setting it as par tof a task is supposed to overwrite this.

further i found that the server wasnt able to resolve itself as a destination without providing an alias to the server name in the server side docker.

finally when it comes to saving artifacts it seems this had to be set in task.output_uri and not in the create or init :man-shrugging:

4 months ago
0 Hi All Im Trying To Save My Model Checkpoints During Runtime But Am Running Into A Confusing Snag. I'M Using The Huggingface Architecture For A Transformer. Using Their Training Module To Control Training. In The Training Args, I Have The

So we have managed to get whole checkpoint files to save by removing the save_total_limit from training, this seems to save checkpoint folders with all files in it. however now we have a ballooning server.

did discover this None
and wondering if there's some nuance in autotracking that needs to be circumvented

4 months ago
0 Hi All Im Trying To Save My Model Checkpoints During Runtime But Am Running Into A Confusing Snag. I'M Using The Huggingface Architecture For A Transformer. Using Their Training Module To Control Training. In The Training Args, I Have The

console output:

clearml.Task - INFO - Completed model upload to file_server/training.e5f99149b9b/models/optimizer.pt
clearml.Task - INFO - Completed model upload to file_server/training.e5f99149b9b/models/scheduler.pt
clearml.Task - INFO - Completed model upload to file_server/training.e5f99149b9b/models/rng_state.pth
save_model
somemodel/checkpoint-198
clearml.Task - INFO - Completed model upload to file_server/training.e5f99149b9b/models/training_args.bin
4 months ago
0 Hi All Im Trying To Save My Model Checkpoints During Runtime But Am Running Into A Confusing Snag. I'M Using The Huggingface Architecture For A Transformer. Using Their Training Module To Control Training. In The Training Args, I Have The

called with:


task = Task.init(
    project_name=project_name, task_name=task_name, output_uri="fileserver_address"
)


task.connect(config)


checkpoint = config.get("model_path")

image_processor = AutoImageProcessor.from_pretrained(
        checkpoint,
        num_labels=config.get("class_number"),
    )

best_model = training(checkpoint, image_processor)
4 months ago
0 Was There Ever A Solution To This Request?

UI, in the dashboard. I know I could create my own custom plot and track it but it seems odd not to have epoch as configurable option

4 months ago
0 Hi All Im Trying To Save My Model Checkpoints During Runtime But Am Running Into A Confusing Snag. I'M Using The Huggingface Architecture For A Transformer. Using Their Training Module To Control Training. In The Training Args, I Have The

queued with:

task = Task.create(
    project_name="name",
    task_name="training",
    repo="repo",
    branch="branch",
    script="training_script",
    packages=package_list,
    docker="docker_gpu_image",
    docker_args=["--network=host"],
)
task.output_uri = "filer_server"
task.enqueue(task, "training")
4 months ago
0 Hi Everyone, I'M Having Trouble Setting My Output_Uri Such That My Model Checkpoints Are Saved Outside Of The Venv And Accessible Via Id Or For Download. Im Running Clear-Ml On A Remote Server Through Docker And I Believe Clearml Is Unable To Resolve The

sure!

So this is how im queuing the job:

task = Task.create(
    project_name="multiclass-classifier",
    task_name="training",
    repo="reponame_url",
    branch="branch_name",
    script='training_script_name',
    packages=package_list,
    docker="python:3.11",
    docker_args="--privileged"

)
task.enqueue(task, "services")  # services queue is the one with a remote worker
4 months ago
0 Hi All Im Trying To Save My Model Checkpoints During Runtime But Am Running Into A Confusing Snag. I'M Using The Huggingface Architecture For A Transformer. Using Their Training Module To Control Training. In The Training Args, I Have The

training function:

def training(checkpoint, image_processor):

    data_test_train, labels, label_to_id, id_to_label = pre_process()

    model = AutoModelForImageClassification.from_pretrained(
        checkpoint,
        num_labels=len(labels),
        id2label=id_to_label,
        label2id=label_to_id,
        ),
    )
    def metrics(eval_pred):
        metric_val = config.get("eval_metric")
        metric = evaluate.load(metric_val)
        predictions, labels = eval_pred
        ...
4 months ago
0 Hi Everyone, I'M Having Trouble Setting My Output_Uri Such That My Model Checkpoints Are Saved Outside Of The Venv And Accessible Via Id Or For Download. Im Running Clear-Ml On A Remote Server Through Docker And I Believe Clearml Is Unable To Resolve The

This issue was solved by adding task.output_uri = "fileserver" in the scheduling script, but for some reason this does not work when setting in task.create call in the same script but needs to be set after. it also doesnt work when being set in the training script, so there must have been some unknown overriding

4 months ago
0 Hi Everyone, I'M Having Trouble Setting My Output_Uri Such That My Model Checkpoints Are Saved Outside Of The Venv And Accessible Via Id Or For Download. Im Running Clear-Ml On A Remote Server Through Docker And I Believe Clearml Is Unable To Resolve The

this is how im initializing before calling my training function. this is inside my training_script:

task = Task.init(project_name=project_name, task_name=task_name, output_uri=True) 
4 months ago
0 Hi Everyone, I'M Having Trouble Setting My Output_Uri Such That My Model Checkpoints Are Saved Outside Of The Venv And Accessible Via Id Or For Download. Im Running Clear-Ml On A Remote Server Through Docker And I Believe Clearml Is Unable To Resolve The

local conf

api {
    # Notice: 'host' is the api server (default port 8008), not the web server.
    api_server: 

    web_server: 

    files_server: 

    # Credentials are generated using the webapp, 

    # Override with os environment: CLEARML_API_ACCESS_KEY / CLEARML_API_SECRET_KEY
    credentials {"access_key": "somekey", "secret_key": "somekey"}
}

 # Default Task output_uri. if output_uri is not provided to Task.init, default_outp...
4 months ago
0 Hi Everyone, I'M Having Trouble Setting My Output_Uri Such That My Model Checkpoints Are Saved Outside Of The Venv And Accessible Via Id Or For Download. Im Running Clear-Ml On A Remote Server Through Docker And I Believe Clearml Is Unable To Resolve The

way im attempting to access with an id

cl_model_id = config.get("model_id")
    #model = Model(model_id=cl_model_id)
    model = InputModel(model_id=cl_model_id)
    checkpoint = model.get_local_copy()
4 months ago
0 Hi Everyone, I'M Having Trouble Setting My Output_Uri Such That My Model Checkpoints Are Saved Outside Of The Venv And Accessible Via Id Or For Download. Im Running Clear-Ml On A Remote Server Through Docker And I Believe Clearml Is Unable To Resolve The

which logs are helpful? console output, fileserver, or api?

my main issue is that i can see that the model artefacts are here file:///root/.clearml/venvs-builds/3.11/task_repository/my_awesome_facility_model/checkpoint-33/scheduler.pt
which i believe is not persistent/retrievable with an artefact id

4 months ago
0 Hi Everyone, I'M Having Trouble Setting My Output_Uri Such That My Model Checkpoints Are Saved Outside Of The Venv And Accessible Via Id Or For Download. Im Running Clear-Ml On A Remote Server Through Docker And I Believe Clearml Is Unable To Resolve The

docker compose on server:


  apiserver:
    command:
    - apiserver
    container_name: clearml-apiserver
    image: allegroai/clearml:latest
    privileged: true
    restart: unless-stopped
    volumes:
    - ${LOGS_DIR}:/var/log/clearml
    - /opt/clearml/config:/opt/clearml/config
    - ${FILESERVER_DATA_DIR}:/mnt/fileserver
    depends_on:
      - redis
      - mongo
      - elasticsearch
      - fileserver
    environment:
      CLEARML_ELASTIC_SERVICE_HOST: elasticsearch
      C...
4 months ago