Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hello Everyone. I'M Getting Started With Clearml. I'M Trying Hpo Atm And Have Successfully Run The Base Task. When Running The Clone Of The Base Task In One Of The Agents, I'M Getting Following Error. Any Suggestions? Tia

Hello Everyone. I'm getting started with clearml. I'm trying HPO atm and have successfully run the base task. When running the clone of the base task in one of the agents, I'm getting following error. Any suggestions? TIA

  
  
Posted one year ago
Votes Newest

Answers 29


Thanks!
fyi: This section is not necessary if you you have clearml.conf file in ~/
Task.set_credentials( api_host=" ", web_host=" ", files_host=" ", key='********************', secret='***********************' )Let me check the code for a min

  
  
Posted one year ago

Hi YummyFish22
Looks like the task does not have "Task.init" call on the main script (or an import of clearml)? could that be the case?

  
  
Posted one year ago

Thanks for your answer. By main script, you mean the base task or the agent?

  
  
Posted one year ago

The base task does have 'Task.init' and both have clearml installed

  
  
Posted one year ago

The one it is trying to execute, i.e. on the Task it shows as Script Path

  
  
Posted one year ago

yes, that's the base task which ran successfully. It does have both

  
  
Posted one year ago

AgitatedDove14 even the base task does not have any Arg named "input_train_data". The base task is self-contained i.e. it downloads training/eval directly data and has direct access to it.

  
  
Posted one year ago

Could it be the Args section of the task it clones does not have the "input_train_data" argument ?

  
  
Posted one year ago

no..I'm not.

  
  
Posted one year ago

could you maybe point me to an example of HPO that uses transformers? Can 't find anything online. maybe I can compare my version. Many thanks

  
  
Posted one year ago

sorry, what do you mean :
manually edit it back to your code

  
  
Posted one year ago

nvm..I got this

  
  
Posted one year ago

image

  
  
Posted one year ago

Notice the args will be set on the connect call, so the check on whether they are empty should come after

  
  
Posted one year ago

ok thanks

  
  
Posted one year ago

YummyFish22 can you point to the huggingface example you are using?

  
  
Posted one year ago

AgitatedDove14
` import os

os.environ['LC_ALL'] = "C.UTF-8"
os.environ['LANG'] = "C.UTF-8"

from clearml import Task

CLEARML_PROJECT = 'Vodafone Sentiment full'
CLEARML_TASK = 'HPO_BASE_TASK'
os.environ["CLEARML_PROJECT"] = CLEARML_PROJECT
os.environ["CLEARML_TASK"] = CLEARML_TASK
os.environ['MPLBACKEND'] = "TkAg"

Task.set_credentials(
api_host=" ",
web_host=" ",
files_host=" ",
key='******************',
secret='
*********************'
)

task = Task.init(project_name=CLEARML_PROJECT, task_name=CLEARML_TASK)

import pandas as pd
import numpy as np
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
from torch.utils.data import Dataset,DataLoader
from transformers import TrainingArguments, Trainer,get_scheduler
from datasets import load_metric, load_dataset, Value, Dataset, load_from_disk
import re
import torch.nn as nn

from clearml import Dataset as ds
from datasets import load_from_disk
import os

artifact_dir = ds.get(dataset_name="vodafone_dataset_preprocessed_train_dataloader", dataset_project="vodafone dataset_full").get_local_copy()
print(os.listdir(artifact_dir))
train_dataset = load_from_disk(os.path.join(artifact_dir, 'data'))

artifact_dir = ds.get(dataset_name="vodafone_dataset_preprocessed_test_dataloader", dataset_project="vodafone dataset_full").get_local_copy()
print(os.listdir(artifact_dir))
eval_dataset = load_from_disk(os.path.join(artifact_dir, 'data'))

print(input_train_data, input_test_data)
print(train_dataset.shape)
print(eval_dataset.shape)

example_configuration = {
'epochs': 5,
'lr': 0.00001,
'optimizer': 'adam'
}

task.connect(example_configuration)

model = AutoModelForSequenceClassification.from_pretrained("cardiffnlp/twitter-xlm-roberta-base-sentiment")
tokenizer = AutoTokenizer.from_pretrained("cardiffnlp/twitter-xlm-roberta-base-sentiment")

metric = load_metric('f1')

def compute_metrics(eval_pred):
predictions, labels = eval_pred
predictions = np.argmax(predictions, axis=1)
return metric.compute(predictions=predictions, references=labels , average='micro')

torch.cuda.empty_cache()

training_args = TrainingArguments(
num_train_epochs=example_configuration['epochs'],
logging_steps=5,
per_device_train_batch_size=4,
per_device_eval_batch_size=4,
evaluation_strategy="epoch",
learning_rate=example_configuration['lr'],
save_steps = 250,
save_total_limit = 2,
output_dir = 'output/',
report_to = 'clearml',
)

trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
tokenizer=tokenizer,
compute_metrics=compute_metrics
)

trainer.train() `

  
  
Posted one year ago

Okay great, so we do have the Args section there.
What do you have in the "Execution" tab?

  
  
Posted one year ago

no..I don't provide any extra arguments

  
  
Posted one year ago

This part is odd:
SCRIPT PATH: tmp.7dSvBcyI7mHow did you end with this random filename? how are you running this code?

  
  
Posted one year ago

this is on kubeflow pipelines

  
  
Posted one year ago

Now that I have shared this with you..I finally saw that kubeflow is injecting this argparse stuff

  
  
Posted one year ago

ohhh yes and that is the issue 😞

  
  
Posted one year ago

I mean you can run it with kubeflow, but it kind of ruins the auto detection there
You can however clone and manually edit it back to your code, that would work

  
  
Posted one year ago

AgitatedDove14 I followed the above format but it still does not work. i'm getting increasingly sure that this is related to huggingface's trainer API. Can you share an example for using huggingface's trainer API if possible? TIA

  
  
Posted one year ago

also hpo controller:
` import os
from clearml import Task

os.environ['MPLBACKEND'] = "TkAg"

CLEARML_PROJECT = "Vodafone Sentiment full"
CLEARML_TASK = "HPO optimizer Controller"
os.environ["CLEARML_PROJECT"] = CLEARML_PROJECT
os.environ["CLEARML_TASK"] = CLEARML_TASK

Task.set_credentials(
api_host=" ",
web_host=" ",
files_host=" ",
key='88888888888',
secret='888888888888888'
)

from clearml.automation import UniformParameterRange, UniformIntegerParameterRange, DiscreteParameterRange
from clearml.automation import HyperParameterOptimizer
from clearml.automation import GridSearch

from clearml import Task

task = Task.init(project_name=CLEARML_PROJECT,
task_name=CLEARML_TASK,
task_type=Task.TaskTypes.optimizer,
reuse_last_task_id=False)

optimizer = HyperParameterOptimizer(

specifying the task to be optimized, task must be in system already so it can be cloned

base_task_id=base_task,

setting the hyper-parameters to optimize

hyper_parameters=[
UniformIntegerParameterRange('General/epochs', min_value=2, max_value=12, step_size=5),
UniformParameterRange('General/lr', min_value=0.000001, max_value=0.0001, step_size=0.002),
],

setting the objective metric we want to maximize/minimize

objective_metric_title='f1',
objective_metric_series='eval',
objective_metric_sign='max',

setting optimizer

optimizer_class=GridSearch,

configuring optimization parameters

execution_queue='default',
max_number_of_concurrent_tasks=4,
optimization_time_limit=60.,
compute_time_limit=120,
total_max_jobs=20,
min_iteration_per_job=0,
max_iteration_per_job=15,
)

optimizer.set_report_period(1)

start the optimization process

this function returns immediately

optimizer.start()

set the time limit for the optimization process (2 hours)

optimizer.set_time_limit(in_minutes=120.0)

wait until process is done (notice we are controlling the optimization process in the background)

optimizer.wait()

optimization is completed, print the top performing experiments id

top_exp = optimizer.get_top_experiments(top_k=3)
print([t.id for t in top_exp])

make sure background optimization stopped

optimizer.stop() `

  
  
Posted one year ago

The base task is self-contained i.e. it downloads training/eval directly data and has direct access to it

I think this is the main issue, how come it does not catch it? Are you using argparser ?

  
  
Posted one year ago

When you are running the base-task, are you proving any arguments to it?
Can you share the "execution" Tab? and the Args tab of the base-task ?

  
  
Posted one year ago
530 Views
29 Answers
one year ago
one year ago
Tags