
Reputation
Badges 1
25 × Eureka!Hi @<1523715429694967808:profile|ThickCrow29> , thank you for pinging!
We fixed the issue (hopefully) can you verify with the latest RC? 1.14.0rc0 ?
LOL that's the spirit , making your team happy is key to success in adoption π
But that should not mean you cannot write to them, no?!
Hi ZippySheep23
Any ideas what might be happening?
I think you passed the upload limit (2.36 GB) π
Could you send the logs?
I can verify the behavior, I think it has to do with the way the subparser was setup.
This was the only way for me to get it to run:script.py test blah1 blah2 blah3 42
When I passed specific arguments (for example --steps) it ignored them...
Hi YummyFish22
Looks like the task does not have "Task.init" call on the main script (or an import of clearml)? could that be the case?
LazyTurkey38 notice the assumption is that the docker entry-point ends with bash, and only then the agent take charge. I'm assuming this is not te case hence the agent spins the docker, then the docker just ends, could that be?
clearml-agent daemon --detached --queue manual_jobs automated_jobs --docker --gpus 0
If the user running this command can run "docker run", then you should ne fine
is no agent listening to the "k8s_scheduler"
There should not be one, this is purely "virtual" , so users understand the k8s cluster is spinning their pod (sometimes it takes time, imagine EKS etc. , just visibility)
unfortunately I can't get info from the cluster
You should be able the pod in the cluster no?!
What's the Task Info panel say, can you share a screen shot ?
I have a model and hundreds of thousands of inference records for that model.
What would be the query ? Are you reporting 100+ diff scalars ?
Hi PanickyMoth78
So the current implantation of the pipeline parallelization is exactly like python async function calls:for dataset_conf in dataset_configs: dataset = make_dataset_component(dataset_conf) for training_conf in training_configs: model_path = train_image_classifier_component(training_conf) eval_result_path = eval_model_component(model_path)
Specifically here since you are passing the output of one function to another, image what happens is a wait operation, hence it ...
Hi UnevenDolphin73
You mean this part?
https://github.com/allegroai/clearml-agent/blob/5afb604e3d53d3f09dd6de81fe0a494dacb2e94d/docs/clearml.conf#L212
(In other words, theΒ
the Task's Environment section
Β is a bit unclear)
Yes we should expand, but generally you are correct it should work as you described π
Hi UnsightlySeagull42
does anyone know how this works with git ssh credentials?
These will be taken from the host ~/.ssh folder
Actually unless you specifically detached the matplotlib automagic, any plt.show() will be automatically reported.
maybe we should add some ENV setting it? (I'm not sure we should disable SSL for all S3 connections... so somehow specify the mino it should use http with)
The upload itself is in the background.
It should not take long to prepare the plot for sending. Are you experiencing a major delay ?
ShallowGoldfish8 how did you get this error?self.Node(**eager_node_def) TypeError: __init__() got an unexpected keyword argument 'job_id'
Hi @<1634001100262608896:profile|LazyAlligator31>
Is this because the code repo is being recreated in this directory?
Yes this is correct π
Basically the entire code base + venv is installed there, to make sure it does not intyerfere with the "system" preinstalled environment
(it also allows for caching on the host machine π )
CheerfulGorilla72
upd: I see NAN in the tensorboard, and 0 in Clearml.
I have to admit, since NaN's are actually skipped in the graph, should we actually log them ?
assuming you have http://hparams.my _param
my suggestion is:
` @hydra.main(config_path="solver/config", config_name="config")
def train(hparams: DictConfig):
task = Task.init(hparams.task_name, hparams.tag)
overrides = {'my_param': hparams.value}
task.connect(overrides, name='overrides')
in remote this will print the value we put in "overrides/my_param"
print(overrides['my_param'])
now we actually use overrides['my_param'] `Make sense ?
Notice Optuna will do TPE & hyper band Bayesian optimization to find the best combination
GrumpyPenguin23 could you help and point us to an overview/getting-started video?
Hmm that should have worked ...
I'm assuming the Task itself is running on a remote agent, correct ?
Can you see the changes in the OmegaConf section ?
what happens when you pass--args overrides="['dataset.path=abcd']"
In that case when you create the Tasks for the step,do not specify any packages/requirements, then the agent will just use the "requirements.txt" from the repository.
If you need you can also specify them when you create the Task itself see https://github.com/allegroai/clearml/blob/912f6f5ba2328b26de042de03f02de5802df360f/clearml/task.py#L608
https://github.com/allegroai/clearml/blob/912f6f5ba2328b26de042de03f02de5802df360f/clearml/task.py#L609
Okay I found the issue ( I think),
If the images are reported very quickly, it will "decide" you are about to override the previous one (i.e. 101 -> overwriting 0, which makes sense, the bug was it would disable the 101 from uploading and not the 0 π )
Test fix:
in /backend_interface/metrics/events.py
, line 292, change:
` last_count = self._get_metric_count(self.metric, self.variant, next=False)
if abs(self._count - last_count) > int(self._file_history_size):
...
Also what do you have in the "Configuration" section of the serving inference Task?
okay so the error should have been:
trains_agent: ERROR: Connection Error: it seems api_server is misconfigured. Is this the TRAINS API server http://<IP>:8008 ?
Not https nor 8010 ?!
By default the agent will add the root of the git repository into the pythonpath , so that you can import...
p.s. StraightCoral86 I might be missing something here, please feel free to describe the entire execution scenario and what you are trying to achieve π