Reputation
Badges 1
25 × Eureka!So the way it works when you run a component the return value with the entire function execution is cached, basically:
this did NOT add the artifact to the pipeline via caching on subsequent runs ❌
you just need to do:
PipelineDecorator.upload_artifact(name='images', artifact_object=img_dir, wait_on_upload=True)
return Task.current_task().artifacts['images'].url
This will return the URL of the uploaded images (i.e. S3 bucket)
which means if this is cached you will get it...
I have an idea, can you try with:task = Task.init(..., reuse_last_task_id=False)
I have a suspicion it starts the Tasks in parallel, and the "reuse_last_task_id" causes them to "reuse the same task locally" which makes them overwrite the configuration of one another.
PanickyMoth78 thank you for the mock code, I can verify it reproduces the issue. It seem that for some reason (bug) when the same function is called multiple times it "collects" parents, hence the odd graph,
BTW: if you want to see exactly what is passed to the step you can press on the step's full_details, and see the hyperparameter section.
I'll make sure we fix this bug in the next RC.
hmm can you share the log of the Task? (the clearml-session created Task)
Hi FancyWhale93 you can disable the auto model uploading with@PipelineDecorator.component(..., auto_connect_frameworks={'pytorch': False}) def step(): pass
That wasn't scheduled by ClearML).
This means that from Clearml perspective they are "manual" i.e the job it self (by calling Task.init) create the experiment in the system, and fills in all the fields.
But for a k8s job, I'm still unsuccessful.
HelpfulDeer76 When you say "unsuccessful" what exactly do you mean ?
Could it be they are reported to the clearml demo server (the default server if no configuration is found) ?
Great! btw: final v1.2.0 should be out after the weekend
ZanyPig66 you are correct in your assumptions. What exactly do you have in the Task? If there is no git repo the entire script should be under "uncommitted changes. What is your case?
is everything on the same network?
Hi SubstantialElk6ClearML-Data
doesn't actually "load" the data, it brings it locally and returns a folder with all your data files, from that point onward, it's up to your code to load it to the framework. Make sense ?
That should not be complicated to implement. Basically you could run 'clearm-task execute --id taskid' as the sagemaker cmd. Can you manually launch it on sagemaker?
Hi @<1578193378640662528:profile|MoodySeaurchin4>
but is it possible to log some metrics too, like rmse or the likes? If so, how would you do it?
Sure, I'm assuming this is part of the output ? if not, this means this is part of your code, and if this is the case then yes you should use collect_custom_statistics_fn
None
`collect_custom_statistics_fn({'rmse'...
Can you put here the task.connect line ? (btw: I would assume there is no need for additional connect, if using hydra+fire, no ?)
Hi SubstantialElk6
Could you test with the latest RC6 ?
pip install clearml==0.17.5rc6
Hi AstonishingRabbit13
is there option to omit the task_id so the final output will be deterministic and know prior to the task run?
Actually no 😞 the full path is unique for the run, so you do not end up overwriting models.
You can get the full path from the UI (Models Tab) or programmatically with Models.query_models or using the Task.get_task methods.
What's the idea behind a fixed location for the model?
The problem is that I currently don't have a way to get them "from outside".
Maybe as a hack (until we add the model object)
` class MyModelCB:
current_args = dict()
@classmethod
def callback(load_save, model_info):
if load_save != "save":
return model_info
model_info.name = "my new name" + str(current_args) # make a name from args
return model_info
WeightsFileHandler.add_pre_callback(MyModelCB.callback)
MyModelCB.current_args = {"args": "value"} `wdyt?
I think my main point is, k8s glue on aks or gke basically takes care of spinning new nodes, as the k8s service does that. Aws autoscaler is kind of a replacement , make sense?
Hi TartBear70
I'm setting up reproducibility myself but when I call Task.init() the seed is changed
Correct
. Is it possible to tell clearml not to initialize any rng? It appears that task.set_random_seed() doesn't change anything.
I think this is now fixed (meaning should be part of the post weekend release)
. Is this documented?
Hmm i'm not sure (actually we should write it, maybe in Task.init docstring?)
Specifically the function that is being called is:
https://gi...
How can I specify the agent to use a specific conda environment inside the docker?
Hi CrookedWalrus33
By default it will pick the highest python in the PATH.
Then if you have a python version (in PATH) that matches the requested on on the Task, it will look for it.
Do you want to limit it to a specific python binary ?
Still, My problem is calling
pipe.start()
crashes.
is supposed to kill the process2022-08-19 09:17:56,626 - clearml - WARNING - Terminating local execution process
This is what it writes before killing the local process.
` /opt/homebrew/anaconda3/envs/py39/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 16 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be ...
Basically two options, spin the clearml-k8s-glue, as a k8s service.
This service takes clearml jobs and creates k8s job on your cluster.
The second option is to spin agents inside pods statically, then inside the pods the agent work in venv model.
I know the enterprise edition has more sophisticated k8s integration where the glue also retains the clearml scheduling capabilities.
https://github.com/allegroai/clearml-agent/#kubernetes-integration-optional
Hi HelpfulDeer76
I mean that the task was being monitored on the demo ClearML server created by Allegro
Yes that is consistent with what I would expect to have happened
Basically if you are running it as k8s job, you can just configure the following environment variables:CLEARML_WEB_HOST:
CLEARML_API_HOST:
CLEARML_FILES_HOST:
CLEARML_API_ACCESS_KEY: <clearml access> CLEARML_API_SECRET_KEY: <clearml secret>
function and just seem to be getting an "isadirectory" error?
Can you post here what you are getting ? which clearml version are you using ?!
also tried manually adding
leap==0.4.1
in the task UI which didn't work.
That has to work, if it did not, can you send the log for the failed Task (or the Task that did not install it)?
The environment in the logs does show that leap is being installed potentially from a cache?
- leap @ file:///opt/keras-hannd...
Aws autoscaler will work with iam rules along as you have it configured on the machine itself. Sagemaker job scheduling (I'm assuming this is what you are referring to, and not the notebook) you need to select the instance as well (basically the same as ec2). What do you mean by using the k8s glue, like inherit and implement the same mechanism but for sagemaker I stead of kubectl ?
Sigint (ctrl c) only
Because flushing state (i.e. sending request) might take time so only when users interactively hit ctrl c we do that. Make sense?
BoredHedgehog47
is this ( https://clearml.slack.com/archives/CTK20V944/p1665426268897429?thread_ts=1665422655.799449&cid=CTK20V944 ) the same issue (or solution) ?
HugeArcticwolf77 oh no, I think you are correct 😞
Do you want to quickly PR a fix ?
Yes clearml is much better 🙂
(joking aside, mlops & orchestration in clearml is miles better)
CheerfulGorilla72 What are you looking for?
In fact, as I assume, we need to write our custom HyperParameterOptimizer, am I right?
Yes exactly! it should be very easy
Just Inherit from RandomSearch and change create_job
https://github.com/allegroai/clearml/blob/d45ec5d3e2caf1af477b37fcb36a81595fb9759f/clearml/automation/optimization.py#L1043
GloriousPenguin2 hmm the UI might strip it?! I mean in most case it should not be there in the first place. Maybe we need to make sure that if provided the web UI will use the stored plotly definition, if this is the case we need to make sure that by default we do not store it, so in most cases the UI can use it to improve the layout. wdyt?