Reputation
Badges 1
25 × Eureka!do you have a video showing the use case for clearml-session
I totally think we should, I'll pass it along π
what is the difference between vscode via clearml-session and vscode via remote ssh extension ?
Nice! remote vscode is usually thought of as SSH, basically you have your vscode running on your machine, and using SSH vscode automatically connects to the remote machine.
Clearml-Session also ads a new capability VSCode inside your browser, where the VSCode itself as well...
This looks exactly like the timeout you are getting.
I'm just not sure what's the diff between the Model autoupload and the manual upload.
But this config should almost never need to change!
Exactly the idea π
notice the password (initially random) is also fixed on your local machine, for the exact same reason
Can you please elaborate on the latter point? My jupyterhubβs fully containerized and allows users to select their own containers (from a list i built) at launch, and launch multiple containers at the same time, not sure I follow how toes are stepped on. (edited)
Definitely a great start, usually it breaks on memory / GPU-mem where too many containers on the same machine are eating each others GPU ram (that cannot be virtualized)
Hi @<1566596960691949568:profile|UpsetWalrus59>
Could it be the two experiments have the exact name ?
(I sounds like a bug in the UI, but I'm trying to make sure, and also understand how to reproduce)
What's your clearml-server version ?
Hi @<1523702932069945344:profile|CheerfulGorilla72>
the agent is Always inherits from the docker system installed environment
If you have a custom venv inside the docker that is Not activated by default you can set the agent to use it:
None
CLEARML_AGENT_SKIP_PIP_VENV_INSTALL
hi @<1546303293918023680:profile|MiniatureRobin9>
I can still see the metrics in Grafana. I
it will not delete it from grafana, it means it's no longer collected, make sense ?
What's the clearml-server version ?
Hi @<1545216070686609408:profile|EnthusiasticCow4>
will ClearML remove the corresponding folders and files on S3?
Yes and it will ask you for credentials as well. I think there is a way to configure it so that the backend has access to it (somehow) but this breaks the "federated" approach
Hmm so the SaaS service ? and when you delete (not archive) a Task it does not ask for S3 credentials when you select delete artifacts ?
off the top of my head, the self hosted is missing the autoscalers (there is an AWS CLI, but no UI or others), also missing a the HPO UI feature,
but you should just check the detailed table here: None
Hi @<1541954607595393024:profile|BattyCrocodile47>
see here: None
Try with app.clearml.mlops-club.org
and the rest of them
I'm getting:hydra_core == 1.1.1
What's the setup you have? python version, OS, Conda yes/no?
How do I best utilize clearml in this scenario such that any coworker of mine is able to reproduce my work with the same pipeline?
Basically this sounds to me like proper software developemnt design (i.e. the class vs stages).
In order to make sure Anyone can reproduce it, you mean anyone can rerun the "pipeline" ? If this is the case just add Task.init (maybe use a specific Task type) and the agents will make sure this is Fully reproducible.
If you mean the data itself is stored, the...
Hi @<1562973095227035648:profile|ThoughtfulOctopus83>
The host should be just the host name, no https prefix, I'm assuming that's the issue
Hi @<1523702786867335168:profile|AdventurousButterfly15>
Make sure you pass output_uri=true in Task.init
It will automatically upload your model to the file server. You can also configure it in the clearml.conf, look for defualt_output_uri
Hi GrotesqueMonkey62 any chance you can be a bit more specific? Maybe a screen grab?
Here is how it works, if you look at an individual experiment scalars are grouped by title (i.e. multiple series on the same graph if they have the same title)
When comparing experiments, any unique combination of title/series will get its own graph, then the different series on the graph are the experiments themselves.
Where do you think the problem lays ?
so the docker didnt use the dns of the host?
I'm assuming it is not configured on your DNS, otherwise it would have been resolved...
Hmm yeah I can see why...
Now that I think about it, at least in theory the second process that torch creates, should inherit from the main one, and as such Task.init is basically "ignored"
Now I wonder why your first version of the code did not work?
Could it be that we patched the argparser on the subprocess and that we should not have?
DefeatedMoth52 how many agents do you have running on the same GPU ?
- At its simplest, this could just mean checking that all of the steps and the pipeline itself have completed successfully (by checking their βTask statusβ).If a pipeline step ends with "failed" status in the pipeline execution function an exception will be raised, if the exception is not caught, the pipeline itself will also fail
run
pipeline_script.py
which contains the pipeline code as decorators.
So in theory the following should actually work.
Let's assume you ...
JitteryCoyote63 the agent.cuda_version
(or CUDA_VERSION env) tell the agent which pytorch wheel to download. CUDNN library can be included inside any wheel and it will work as long as the cuda / cudart exist on the system, for example pytorch wheels include the cudnn they use . agent.cudnn_version
should actually be deprecated, and is not actually used.
For future reference, dependency order:
Nvidia Drivers CUDA library and CUDA-runtime libraries (libcuda.so / libcudart.so) CUDN...
2023-02-15 12:49:22,813 - clearml - WARNING - Could not retrieve remote configuration named 'SSH'
This is fine, it means it uses the default identity keys
The thing is - when I try to connect with normal SSH there are no issues
Now I'm lost, so when exactly do you see the issue ?
So from foo.mod import
"translates" to foo-mod @ git+
None ..
?
Yes, that makes sense. But did you see the callback being executed ? it seems it was supposed to, then the next call would have been 2:30 hours later, am I missing something ?
You can just spin another agent on the same machine π