The file is never touched, nowhere in the process that file is deleted.
it should never have gotten there, this is not the git repo folder, it one level above...
I suspect it failed to create one on the host and then mount into the docker
ouch, I think you are correct, can you test a fix?
StraightDog31 how did you get these ?
It seems like it is coming from maptplotlib, no?
I got everything working using the default queue. I can submit an experiment, and a new GPU node is provisioned, all good
Nice!
My next question, how do I add more queues?
You can create new queues in the UI and spin a new glue for the queue (basically think of a queue as an abstraction for a specific type of resource)
Make sense ?
Ohh now I get it...
Wait a couple of hours, 0.16 is out today with trains-agent --stop flag 🙂
I have a lot of parameters, about 40. It is inconvenient to overwrite them all from the window that is on the screen.
Not sure I follow, so what are you suggesting?
And is Exectuer actually runs something, or is it IO?
I think it fails because it tries to install trains twice. Could you remove the trains package, and test? I'm also curious how do you have both installed?!
It should have worked....
Can you run the examples from the repo and see if they work?
Actually scikit implies joblib 🙂 (so you should use scikit, anyhow I'll make sure we add joblib as it is more explicit)
ERROR: Error checking for conflicts. ... AttributeError: _DistInfoDistribution__dep_map
@<1538330703932952576:profile|ThickSeaurchin47> can you try the artifacts example:
None
and in this line do:
task = Task.init(project_name='examples', task_name='Artifacts example', output_uri="
")
WittyOwl57
To get task Id's use (e.g. all the tasks of a specific project):task_ids = Task.query_tasks(project_name="examples", task_filter={'status': ["completed"])
Then per task:
` for t_id in tasks_id:
t = Task.get_task(t_id)
conf_dict = t.get_configuration_as_dict(name="filter")
task_param = t.get_parameters()
task_param['filter'] = conf_dict
# this is to enable to forcefully update parameters post execution
t.mark_started(force=True)
# update hyper-parame...
Hi @<1538330703932952576:profile|ThickSeaurchin47>
Specifically I’m getting the error “could not access credentials”
Put your minio credentials here:
None
Q. Would someone mind outlining what the steps are to configuring the default storage locations, such that any artefacts or data which are pushed to the server are stored by default on the Azure Blob Store?
Hi VivaciousPenguin66
See my reply here on configuring the default output uri on the agent: https://clearml.slack.com/archives/CTK20V944/p1621603564139700?thread_ts=1621600028.135500&cid=CTK20V944
Regrading permission setup:
You need to make sure you have the Azure blob credenti...
It’s only on this specific local machine that we’re facing this truncated download.
Yes that what the log says, make sense
Seems like this still doesn’t solve the problem, how can we verify this setting has been applied correctly?
hmm exec into the container? what did you put in clearml.conf?
I think the main difference is that I can see a value of having access to the raw format within the cloud vendor and not only have it as an archive
I see it does make sense.
Two options, one, as you mentioned use the ClearML StorageManager to upload the files, then register them as external links with Dataset.
Two, I know the enterprise tier has HyperDatasets, that are essentially what you describe, with version control over the "metadata" and "raw storage" on the GCP, including the ab...
SmarmySeaurchin8
When running in "dev" mode (i.e. writing the code) only packages imported directly are registered under "installed packages" , then when the agent is executing the experiment, it will update back the entire environment (including derivative packages etc.)
That said you can set detect_with_pip_freeze
to true (in trains.conf) and it will basically store the entire pip freeze.
https://github.com/allegroai/trains/blob/f8ba0495fb3af1f99732fdffbbccd2fa992934a4/docs/trains.c...
Hi QuaintPelican38
Assuming you have open the default SSH port 10022 on the ec2 instance (and assuming the AWS premissions are set so that you can access it). You need to use the --public-ip
flag when running the clearml-session. Otherwise it "thinks" it is running on a local network and it registers itself with the local IP. With the flag on it gets the public IP of the machine, then the clearml-session running on your machine can connect to it.
Make sense ?
create inside another task that would again run remotely
This Task will be run on another node, user / permissions will be dealt with by the agent on the other node running the Task
(But in venv mode is also hangs the same way)
Hmm this is strange, could it be you are running out of storage ?
You need trains-server support, so if trains v0.15 is working with older backend it will revert to "training" type