
Reputation
Badges 1
25 × Eureka!this is very odd, can you post the log?
Hi RipeGoose2
Can you try with the latest from git ?pip install -U git+
I mean the caching will work, but it will reinstall this repository on top of the cached copy.
make sense ?
Hi EnviousStarfish54
Verified with the frontend / backend guys.
Backend allows to search for "all" tags, and frontend will add a toggle button for the UI to select or/all for the selected Tags.
Should be part of the next release
Hi SmoothSheep78
Do you need to import the previous state of the trains-server, or are you starting from scratch ?
Check the examples on the github page, I think this is what you are looking for π
https://github.com/allegroai/trains-agent#running-the-trains-agent
@<1523701868901961728:profile|ReassuredTiger98> what are you getting with:
nvidia-smi
And here:
ls -la /usr/local/
I still have name
my_name
, but the project name
my_project/.datasets/my_name
rather than
my_project/.datasets
Yes, this is the expected behavior
And I don't see any new projects / subprojects where that dataset creation Task is stored
They are marked "hidden" hence by default you cannot see them in the UI (so they will only appear in the Dataset page),
you can turn the UI hidden flag by going to your settings page and selecting "Con...
Hi UnevenDolphin73
Maybe. When the container spins, are there any identifiers regarding the task etc available?
You mean at the container level or at clearml?
I create a folder on the bucket perΒ
python train.py
Β so that the environment variables files doesn't get overwritten if two users execute almost-simultaneously
Nice π I have an idea, how about per user ID? then they can access their "secrets" based on the owner of the Task ?task.data.user
Hi TartBear70
I'm setting up reproducibility myself but when I call Task.init() the seed is changed
Correct
. Is it possible to tell clearml not to initialize any rng? It appears that task.set_random_seed() doesn't change anything.
I think this is now fixed (meaning should be part of the post weekend release)
. Is this documented?
Hmm i'm not sure (actually we should write it, maybe in Task.init docstring?)
Specifically the function that is being called is:
https://gi...
UnevenDolphin73 are you positive, is this reproducible? What are you getting?
Very lacking wrt to how things interact with one another
If I'm reading it correctly, what you are saying is that some of the "big picture" / holistic approach on how different parts interact with one another is missing, is that correct?
I think ClearML would benefit itself a lot if it adopted a documentation structure similar to numpy ecosystem
Interesting thought, what exactly would you suggest we "borrow" in terms of approach?
What's the trains-server version ?
- Maybe we should add an option, archive components as well ...
I'm sorry my bad, this is use_current_task
https://github.com/allegroai/clearml/blob/6d09ff15187197e1f574902352115aa08dc1c28a/clearml/datasets/dataset.py#L663task = Task.init(...) dataset = Dataset.create(..., use_current_task=True) dataset.add_files(...)
Hi @<1532532498972545024:profile|LittleReindeer37>
This is truly a great discussion to have. Personally I think the main difference is that software development is a somewhat linear process , and git captures it very well. But ML is a lot wider nonlinear process, which to me means that trying to conform the same workflow into a Dev tree will end up failing. The way ClearML thinks about it (and I think the analogy to source control is correct ) is probably closer to how you think about proj...
Hi CooperativeFox72
But my docker image has all my code and all the packages it needed I don't understand why the agent need to install all of those again?Β (edited)
So based on the docker file you previously posted, I think all your python packages are actually installed on the "appuser" and not as system packages.
Basically remove the "add user" part and the --user
from the pip install.
For example:
` FROM nvidia/cuda:10.1-cudnn7-devel
ENV DEBIAN_FRONTEND noninteractive
RUN ...
Hi @<1690896098534625280:profile|NarrowWoodpecker99>
Once a model is loaded into GPU memory for the first time, does it stay loaded across subsequent requests,
yes it does.
Are there configuration options available that allow us to control this behavior?
I'm assuming your're thinking dynamic loading/unloading models from memory based on requests?
I wish Triton added that π this is not trivial and in reality to be fast enough the model has to leave in RAM then moved to GPU (...
It only happens in the clearml environment, works fine local.
Hi BoredHedgehog47
what do you mean by "in the clearml environment" ?
RobustGoldfish9
I think you need to set the trains-agent docker to be aware of the host, so it knows how to mount data/cache/configurations into the sibling docker
It should look something like:TRAINS_AGENT_DOCKER_HOST_MOUNT="/mnt/host/data:/root/.trains"
So if running a docker:docker run -e TRAINS_AGENT_DOCKER_HOST_MOUNT="/mnt/host/data:/root/.trains" ...
Hi @<1524922424720625664:profile|TartLeopard58>
canβt i embed scalars to notion using clearml sdk?
I think that you need the hosted version for it (it needs some special CORS stuff on the server side to make it work)
Did you try in the clearml report? does that work?
then will have to rerun the pipeline code then manually get the id and update the task.
Makes total sense to me!
Failed auto-generating package requirements: _PyErr_SetObject: exception SystemExit() is not a BaseException subclass
Not sure why you are getting this one?!
ValueError: No projects found when searching for
MyProject/.pipelines/PipelineName
hmm, what are you getting with:
task = Task.get_task(pipeline_uid_here)
print(task.get_project_name())
Hi GiddyTurkey39 ,
When you say trains agent, are you referring to the trains agent command ...
I mean running the trains-agent daemon
on a machine. This means you have a daemon pulling jobs from the execution queue and executing them (either in virtual environment, or inside a docker)
You can read more about https://github.com/allegroai/trains-agent and https://allegro.ai/docs/concepts_arch/concepts_arch/
Is it sufficient to queue the experiments
Yes there is no ne...
The problem is that clearml installsΒ
cudatoolkit=11.0
Β butΒ
cudatoolkit=11.1
Β is needed.
You suggested this fix earlier, but I am not sure why it didnt work then.
Hmm , could you test with the clearml-agent 0.17.2 ? making surethis actually solves the problem
Hmm maybe this is the issue, :
Conda error: UnsatisfiableError: The following specifications were found to be incompatible with a past
explicit spec that is not an explicit spec in this operation (cudatoolkit):
- pytorch~=1.8.0 -> cudatoolkit[version='>=10.1,<10.2|>=10.2,<10.3']
This makes no sense, conda is saying pytorch=1.8 needs cudatoolkit <10.2/10.3 but actually it needs cudatoolkit 11.1