Hmm I see what you mean. It is on the roadmap (ETA the next version 0.17, 0.16 is due in a week or so) to add multiple models per Task so it is easier to see the connections in the UI. I'm assuming this will solve the problem?
WackyRabbit7 my apologies for the lack of background in my answer 🙂
Let me start from the top, one of the goal of the trains-agent is to reproduce the "original" execution environment. Once that is done, it will launch the code and monitor it. In order to reproduce the original execution environment, trains-agent will install all the needed python packages, pull the code, and apply the uncommitted changes.
If your entire environment is python based, then virtual-environment mode is proba...
MuddySquid7 you mean you are creating them with TB ? or are you uploading them as debug images ?
Specifically in the ClearML UI, do you have it under "plots" tab or "debug samples" tab ?
ngrok to connect to the remote server at the office?
That makes sense, I guess this is the equivalent of using a VPN, from that point onward clearml-session can directly access the remote machine, right?
Hi MinuteWalrus85
This is great question, and super important when training models. This is why we designed a whole system to manage datasets (including storage querying, balancing data, and caching). Unfortunately this is only available in the paid tier of Allegro... You are welcome to https://allegro.ai/enterprise/ the sales guys.
🙂
you can run md5 on the file as stored in the remote storage (nfs or s3)
s3 is implementation specific (i.e. minio weka wassaby etc, might not support it) and I'm actually not sure regrading nfs (I mean you can run it, but it actually means you are reading the data, that said, nfs by definition I'm assuming is relatively fast access)
wdyt?
SmallDeer34 No worries, I'm happy to hear the issue disappeared 🙂
That makes sense...
Basically in the open-source version the approach is everyone sees everything for maximum transparency (and also ease of use). I know there are access-roles in the paid tier and vault for exactly these types of things...
Where do you currently save them? and how do you pass them to the remote machine ?
Now I need to figure out how to export that task id
You can always look it up 🙂
How come you do not have it?
mean? Is it not possible that I call code that is somewhere else on my local computer and/or in my code base? That makes things a bit complicated if my current repository is not somehow available to the agent.
I guess you can ignore this argument for the sake of simple discussion. If you need access to extra files/functions, just make sure you point the repo argument to their repo, and the agent will make sure your code is running from the repo root, with all the repo files under i...
note
/home/npuser/.clearml/venvs-builds/3.7/task_repository/commons-imagery-models-py
is the correct pat
So how come it is failing?
Can you also print sys.path just to be sure ?
Add '/' , like you would with a file system.Task.init(project_name='main_project/sub_project', task_name='test')
Because we are working with very big files, having them stored at multiple locations is something we try to avoid
Just so I better understand, is this for storing files as part of a dataset, or as debug samples ?
In other words can two diff processes create the exact same file (image) ?
, I can see the shape is
[136, 64, 80, 80]
. Is that correct?
Yes that's correct. In case of the name, just try input__0
Notice you also need to convert it to torchscript
OutrageousSheep60
I found the task in the UI -
and in the
UNCOMMITTED CHANGES
execution section there is
No changes logged
This is the issue.
and then run the
session
via docker
clearml-session --docker nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04 \ --packages "clearml" "tensorflow>=2.2" "keras" \ --queue MY_QUEUE \ --verboseAre you running the "cleamrl-session" from your machine? (i.e. not from inside a docker) ?...
So was the issue solved?
None
Change to:
CLEARML_AGENT_GIT_USER: ${CLEARML_AGENT_GIT_USER:my_git_user_here}
and the same for the password.
You can also just set the environment variables before launching docker-compose, whatever is more convenient for you
Yes, the mechanisms under the hood are quite complex, the automagic does not come for "free" 🙂
Anyhow, your perspective is understood. And as you mentioned I think your use case might be a bit less common. Nonetheless we will try to come-up with a solution (probably an argument for Task.init so you could specify a few more options for the auto package detection)
I think I found something, let me test my theory
VivaciousWalrus99
Yes this is odd:1608392232071 spectralab:gpu0 DEBUG New python executable in /cs/usr/gal.hyams/.trains/venvs-builds/3.7/bin/python2So it thinks it has python v3.7 but it is using python2 in the venv...
In your trains.conf file, set agent.python_binary to the python3.7 binary. It should be something like:agent.python_binary=/path/to/python/python3.7
Hi JitteryCoyote63
Or even better: would it be possible to have a support for HTML files as artifacts?
If you report html files as debug media they will be previewed, as long as the link is accessible.
You can check this example:
https://github.com/allegroai/trains/blob/master/examples/reporting/html_reporting.py
In the artifacts, I think html are also supported (maybe not previewed as nicely but clickable.
Regrading the s3 link, I think you are supposed to get a popup window as...
Hi @<1523701295830011904:profile|CluelessFlamingo93>
What do you mean? what's the difference between ClearML server and self hosted? both are self hosted no?
GloriousPenguin2 hmm the UI might strip it?! I mean in most case it should not be there in the first place. Maybe we need to make sure that if provided the web UI will use the stored plotly definition, if this is the case we need to make sure that by default we do not store it, so in most cases the UI can use it to improve the layout. wdyt?
Okay, I'll make sure we always qoute " , since it seems to work either way.
We will release an RC soon, with this fix.
Sounds good?
I added the following to the
clearml.conf
file
the conf file that is on the worker machine ?
I was unable to reproduce, but I added a few safety checks. I'll make sure they are available on the master in a few minutes, could maybe rerun after?
Okay this seems correct:
pytorch=1.8.0=py3.7_cuda11.1_cudnn8.0.5_0
I can't seem to find what's the diff between the two.
Give me a second let me check if I can reproduce it somehow.
Hi FrothyShark37
Can you verify with the latest version?
pip install -U clearml
Very odd, I still can't reproduce. This is just the cleanup service running without anything else ?
What's the clearml version it is using ?