Hi TartBear70
I'm setting up reproducibility myself but when I call Task.init() the seed is changed
Correct
. Is it possible to tell clearml not to initialize any rng? It appears that task.set_random_seed() doesn't change anything.
I think this is now fixed (meaning should be part of the post weekend release)
. Is this documented?
Hmm i'm not sure (actually we should write it, maybe in Task.init docstring?)
Specifically the function that is being called is:
https://gi...
UnevenDolphin73 are you positive, is this reproducible? What are you getting?
Very lacking wrt to how things interact with one another
If I'm reading it correctly, what you are saying is that some of the "big picture" / holistic approach on how different parts interact with one another is missing, is that correct?
I think ClearML would benefit itself a lot if it adopted a documentation structure similar to numpy ecosystem
Interesting thought, what exactly would you suggest we "borrow" in terms of approach?
What's the trains-server version ?
- Maybe we should add an option, archive components as well ...
I'm sorry my bad, this is use_current_task
https://github.com/allegroai/clearml/blob/6d09ff15187197e1f574902352115aa08dc1c28a/clearml/datasets/dataset.py#L663task = Task.init(...) dataset = Dataset.create(..., use_current_task=True) dataset.add_files(...)
Hi @<1690896098534625280:profile|NarrowWoodpecker99>
Once a model is loaded into GPU memory for the first time, does it stay loaded across subsequent requests,
yes it does.
Are there configuration options available that allow us to control this behavior?
I'm assuming your're thinking dynamic loading/unloading models from memory based on requests?
I wish Triton added that 🙂 this is not trivial and in reality to be fast enough the model has to leave in RAM then moved to GPU (...
It only happens in the clearml environment, works fine local.
Hi BoredHedgehog47
what do you mean by "in the clearml environment" ?
RobustGoldfish9
I think you need to set the trains-agent docker to be aware of the host, so it knows how to mount data/cache/configurations into the sibling docker
It should look something like:TRAINS_AGENT_DOCKER_HOST_MOUNT="/mnt/host/data:/root/.trains"
So if running a docker:docker run -e TRAINS_AGENT_DOCKER_HOST_MOUNT="/mnt/host/data:/root/.trains" ...
Hi @<1524922424720625664:profile|TartLeopard58>
can’t i embed scalars to notion using clearml sdk?
I think that you need the hosted version for it (it needs some special CORS stuff on the server side to make it work)
Did you try in the clearml report? does that work?
then will have to rerun the pipeline code then manually get the id and update the task.
Makes total sense to me!
Failed auto-generating package requirements: _PyErr_SetObject: exception SystemExit() is not a BaseException subclass
Not sure why you are getting this one?!
ValueError: No projects found when searching for
MyProject/.pipelines/PipelineName
hmm, what are you getting with:
task = Task.get_task(pipeline_uid_here)
print(task.get_project_name())
Hi GiddyTurkey39 ,
When you say trains agent, are you referring to the trains agent command ...
I mean running the trains-agent daemon
on a machine. This means you have a daemon pulling jobs from the execution queue and executing them (either in virtual environment, or inside a docker)
You can read more about https://github.com/allegroai/trains-agent and https://allegro.ai/docs/concepts_arch/concepts_arch/
Is it sufficient to queue the experiments
Yes there is no ne...
The problem is that clearml installs
cudatoolkit=11.0
but
cudatoolkit=11.1
is needed.
You suggested this fix earlier, but I am not sure why it didnt work then.
Hmm , could you test with the clearml-agent 0.17.2 ? making surethis actually solves the problem
Hmm maybe this is the issue, :
Conda error: UnsatisfiableError: The following specifications were found to be incompatible with a past
explicit spec that is not an explicit spec in this operation (cudatoolkit):
- pytorch~=1.8.0 -> cudatoolkit[version='>=10.1,<10.2|>=10.2,<10.3']
This makes no sense, conda is saying pytorch=1.8 needs cudatoolkit <10.2/10.3 but actually it needs cudatoolkit 11.1
DilapidatedDucks58 long story short:
if you do:
` from clearml import StorageManager
from clearml.storage.helper import StorageHelper
StorageHelper.get(" ", retries=5) `It should make sure that all the other s3:// links of this bucket will use the same original configuration (i.e. retries)
If this workaround works let's make sure we add it into the conf file, wdyt ?
Seems like passing the Task object is not working as expected (I'll make sure it is fixed).
Try:dataset._task.set_parent(Task.current_task().id)
gm folks, really liking ClearML so far as my top choice (after looking at dvc, mlflow), and thank you for your help here!
Thanks HurtWoodpecker30 !
Is there a recommended workflow to be able to “drop into” the
exact
env
(code, venv, data) of a previous experiment (which may have been several commits ago), to reproduce that experiment?
You can use clearml-agent on your local machine to build the env of any Task,
` clearml-agent build --id <ta...
LudicrousParrot69 this is implementation issue, this entire page is based on "task comparison" single Task means totally different interface for querying the data 🙂
JitteryCoyote63
I agree that its name is not search-engine friendly,
LOL 😄
It was an internal joke the guys decided to call it "trains" cause you know it trains...
It was unstoppable, we should probably do a line of merch with AI 🚆 😉
Anyhow, this one definitely backfired...
Can you fix locally, just to verify ?
DistressedGoat23 notice the last argument in report_histogram, 'extra_layout'
https://clear.ml/docs/latest/docs/references/sdk/logger#report_histogram
You can then specify the plotly histogram orientation, full details here:
https://plotly.com/javascript/reference/bar/
I'm assuming the one you are after is 'orientation '
https://plotly.com/javascript/reference/bar/#bar-orientation
yes, TrickySheep9 use the k8s glue from here:
https://github.com/allegroai/clearml-agent/blob/master/examples/k8s_glue_example.py
Now that we have the free tier (a.k.a community server) we might change the default behavior.
The idea is always to allow an easy way to on-board and test the system.
ReassuredTiger98
BTW: what's the scenario where your machine reverted to the default configuration (i.e. no configuration file) ?
trains-agent doesn't run the clone, it is pip...
basically calling "pip install git+https://..."
Not sure you can pass extra arguments
Also, this is not a setup problem, otherwise it would have seen consistently failing ... this actually looks like a network issue.
The only thing I can think of is retrying to install if we get network error (not sure whats the exit code of pip though (maybe 9?)