Reputation
Badges 1
25 × Eureka!Yes, experiments are standalone as they do not have to have any connecting thread.
When would you say a new "run" vs a new "experiment" ? when you change a parameter ? change data ? change code ?
If you want to "bucket them" use projects π it is probably the easiest now that we have support for nested projects.
I execute theΒ
clearml-session
Β withΒ
--docker
Β flag.
This is to control the docker image the agent will spin for you (think dev enviroment you want to work in, like nvidia pytorch container already having everything you need)
Hi @<1571308003204796416:profile|HollowPeacock58>
could you share the full log ?
task.update({'script': {'version_num': 'my_new_commit_id'}})
This will update to a specific commit id, you can pass empty string '' to make the agent pull the latest from the branch
Seems like a Task contained an invalid artifact link.
I wouldn't sweat over it, it basically a warning that it could not locate the actual file to delete (albeit an ugly warning π )
I think AnxiousSeal95 would know when will the new version be ready.
regardless, is it actually deleting old Tasks ?
Thanks @<1694157594333024256:profile|DisturbedParrot38> !
Nice catch.
Could you open a github issue so that at least we output a more informative error?
BTW:
Error response from daemon: cannot set both Count and DeviceIDs on device request.
Googling it points to a docker issue (which makes sense considering):
https://github.com/NVIDIA/nvidia-docker/issues/1026
What is the host OS?
Since I can't use the
torchrun
comand (from my tests, clearml won't use it on the clearm-agent), I went with the
@<1556450111259676672:profile|PlainSeaurchin97> did you check this example?
None
What's the error you are getting ?
Hi SteadyFox10 , this one will get all the last metric scalarstrain_logger.get_last_scalar_metrics()
Ok no it only helps if as far as I don't log the figure.
you mean if you create the natplotlib figure and no automagic connect you still see the mem leak ?
UnevenDolphin73 following the discussion https://clearml.slack.com/archives/CTK20V944/p1643731949324449 , I suggest this change in the pseudo code
` # task code
task = Task.init(...)
if not task.running_locally() and task.is_main_task():
# pre-init stage
StorageManager.download_folder(...) # Prepare local files for execution
else:
StorageManager.upload_file(...) # Repeated for many files needed
task.execute_remotely(...) `Now when I look at is, it kinds of make sense to h...
restart_period_sec
I'm assuming development.worker.report_period_sec , correct?
The configuration does not seem to have any effect, scalars appear in the web UI in close to real time.
Let me see if we can reproduce this behavior and quickly fix
Hi @<1657918706052763648:profile|SillyRobin38>
Hi everyone, I wanted to inquire if it's possible to have some type of model unloading.
What do you mean by "unloading" ? you mean remove it from the clearml-serving endpoint ?
If this is from the clearml-serving, then yes you can online :
None
Yes
Are you trying to upload_artifact to a Task that is already completed ?
yes, i see no more than 114 plots in the list on the left side in full screen modeβjust checked and the behavior exists on safari and chrome
Let me check with front-end guys π
Hi DilapidatedDucks58 ,
Are you running in docker or venv mode?
Do the works share a folder on the host machine?
It might be syncing issue (not directly related to the trains-agent but to the facts you have 4 processes trying to simultaneously access the same resource)
BTW: the next trains-agent RC will have a flag (default off) for torch-nightly repository support π
Oh, so the pipeline basically makes itself their parent, this means you can get their IDs:steps_ids = Task.query_tasks(task_filter=dict(parent=<pipeline_id_here)) for task_id in steps_ids: task = Task.get_task(task_id)
Verified @<1643060801088524288:profile|HarebrainedOstrich43> RC will be out soon for you to test, thank you again for catching it, not sure how internal tests missed it (btw the pipeline is created it's just not shown in the right place due to some internal typo)
Thus, the return data from step 2 needs to be available somewhere to be used in step 3.
Yep π
It will serialize the data on the dict?
I thought it will just point to a local file location where you have the data π
I didnβt know that each steps runs in a different process
Actually ! you can run them as functions as well, try:if __name__ == '__main__': PipelineDecorator.debug_pipeline() # call pipeline function hereIt will just run them as functions (ret...
Okay, this is odd the request returned exactly 100 out 100.
It seems not all of them were reported?!
Could you post the toy code, I'll check what's going on.
HappyDove3 where are you running the code?
(the upload is done in the background, but it seems the python interpreter closed?!)
You can also wait for the upload:task.upload_artifact(name="my artifact", artifact_object=np.eye(3,3), wait_on_upload=True)
Thanks EnviousStarfish54 !
Cloud Access section is in theΒ
Profile
Β page.
Any storage credentials (S3 for example) are only stored on the client side (never the trains-server), this is the reason we need to configure them in the trains.conf. When the browser needs to access those URL's (downloading an artifact) it also needs the secret/key, it automatically display a popup requesting them, and will store them in this section. Notice they are stored on the browser session (as a cookie).
Looks great, let me see if I can understand what's missing, because it should have worked ...
How can I add additional information, e.g. debug samples, or scalar to the data to be shown in the UI?Β Logger.current_logger() is not working
Yes π
dataset.get_logger() to the rescue
Hi @<1566596960691949568:profile|UpsetWalrus59>
you should call it before initializing the Task
Task.ignore_requirements("pywin32")
task = Task.init(...)
Happy new year @<1618780810947596288:profile|ExuberantLion50>
- Is this the right place to mention such bugs?Definitely the right place to discuss them, usually if verified we ask to also add in github for easier traceability / visibility
m (i.e. there's two plots shown side-by-side but they're actually both just the first experiment that was selected). This is happening across all experiments, all my workspaces, and all the browsers I've tried.
Can you share a screenshot? is this r...
Does the clearml module parse the python packages?
Yes it analyzes the installed packages based on the actual mports you have in the code.
If I'm using a private pypi artifact server, would I set the PIP_INDEX_URL on the workers so they could retrieve those packages when that experiment is cloned and re-ran?
Correct π the agent basically calls pip install on those packages, so if you configure it, with PIP_INDEX_URL it should just work like any other pip install