Reputation
Badges 1
25 × Eureka!HealthyStarfish45 what exactly did you have in mind, in terms of the widget ?
HealthyStarfish45 this sounds very cool! How can I help?
WackyRabbit7 my apologies for the lack of background in my answer π
Let me start from the top, one of the goal of the trains-agent is to reproduce the "original" execution environment. Once that is done, it will launch the code and monitor it. In order to reproduce the original execution environment, trains-agent will install all the needed python packages, pull the code, and apply the uncommitted changes.
If your entire environment is python based, then virtual-environment mode is proba...
It seems like there is no way to define that a Task requires docker support from an agent, right?
Correct, basically the idea is you either have workers working in venv mode or docker.
If you have a mixture of the two, then you can have the venv agents pulling from one queue (say default_venv) and the docker mode agents pulling from a different queue (say default_docker). This way you always know what you are getting when you enqueue your Task
If it helps, you can override it on the clients with an OS environment CLEARML_FILES_HOST
I think they (DevOps) said something about next week, internal roll-out is this week (I think)
RoughTiger69
Apparently,
, doesnβt populate that dict with
any keys that donβt already exist in it
.
Are you saying new entries are not added to the Dict even if they are on the Task (i.e. only entries that already exist on the dict are populated ?
But you already have all the entries defined here:
https://github.com/allegroai/clearml/blob/721569bb77d89d89e5b4f32a0ed98311c4574650/examples/services/aws-autoscaler/aws_autoscaler.py#L22
Since all this is ha...
Yep this will work. BTW check the new pipeline it might have a more flexible solution
https://github.com/allegroai/clearml/blob/master/examples/pipeline/full_custom_pipeline.py
JitteryCoyote63 s3 should work, you can go to your profile page, see if you do not have some old credentials already there, maybe this is the issue.
JitteryCoyote63 are you running the agent in docker mode ?
Hi JitteryCoyote63
I think this is the default python str() casting.
But you can specify the preview test when you call upload_artifact:
https://clear.ml/docs/latest/docs/references/sdk/task#upload_artifact
see preview
argument
Yes, the agent's mode is global, i.e. all tasks are either inside docker or in venv. In theory you can have two agents on the same machine one venv one docker listening to two diff queues
ohh, the copy paste thing when you generate credentials ?
https://github.com/allegroai/clearml/issues/199
Seems already supported for a while now ...
BTW,Β
Β has this at the bottom:
Yes, it is the company legal entity name. But I think that for refrencing it makes more sense to mention the product name ClearML
I think this looks good π
And I think the default is 100 entries, so it should not get cleaned.
and then they are all removed and for a particular task it even happens before my task is done
Is this reproducible ? Who is cleaning it and when?
Number of entries in the dataset cache can be controlled via cleaml.conf : sdk.storage.cache.default_cache_manager_size
Is there any way to get just one dataset folder of a Dataset? e.g. only "train" or only "dev"?
They are usually stored in the same "zip" so basically you have to download both folders anyhow, but I guess if this saves space we could add this functionality, wdyt?
JitteryCoyote63 any chance the trains-agent-1
is running in services mode ?
Which means it will spin more than a single experiment at once
Yes there was a bug that it was always cached, just upgrade the clearmlpip install git+
For now we've monkey-patched it to our usecase:
LOL, that's a cool hack
That gives us the benefit of creating "local datasets" (confined to the scope of the project, do not appear in
Datasets
tabs, but appear as normal tasks within the project)
So what would be a "perfect" solution here?
I think I'm missing the point on why it became an issue in the first place.
Notice that in new versions Dataset will be registered on the Tasks that use them (they are already...
Hi @<1547028074090991616:profile|ShaggySwan64>
I'm guessing just copying the data folder with rsync is not the most robust way to do that since there can be writes into mongodb etc.
Yep
Does anyone have experience with something like that?
basically you should just backup the 3 DBs (mongo, redis, elastic) each one based on their own backup workflows. Then just rsync the files server & configuration.
I guess last followup question, is there a way to cap costs?
Scale tier ? (I know it is not per usage, but it is probably more than 15$ per user π )
Hmm there was this one:
https://github.com/allegroai/clearml/commit/f3d42d0a531db13b1bacbf0977de6480fedce7f6
Basically always caching steps (hence the skip), you can install from the main branch to verify this is the issue. an RC is due in a few days (it was already supposed to be out but got a bit delayed)
Why do you ask? is your server sluggish ?
Pretty confusing that neither
services
StickyLizard47 basically this is how a services queue agent should be spinned:
https://github.com/allegroai/clearml-server/blob/9b108740da21f25407bd2c59583ca1c86f8e1faa/docker/docker-compose.yml#L123
When spinning on a k8s cluster, this is a bit more complicated, as it needs to work with the clearml-k8s-glue.
See here how to spin it on k8s
https://github.com/allegroai/clearml-agent/tree/master/docker/k8s-glue
btw: any specific reason to call current_task after you closed the main Task ?
Can you please tell me how to return the folder where the script should run?
add it to the python path
PYTHONPATH="/src/project"
Hi ExcitedFish86
In Pytorch-Lightning I use DDP
I think a fix for pytorch multi-node / process distribution was commited to 1.0.4rc1, could you verify it solves the issue ? (rc1 should fix this specific issue)
BTW: no problem working with cleaml-server < 1