Reputation
Badges 1
25 × Eureka!others from the local environment and this causes a conflict when importing the attr module
Inside the docker ? " local environment" ?
This is all under "root" no?
I see them run reliably (no killed), are they running in service mode?
How do you deploy agents, with the clearml k8s glue ?
the second seems like a botocore issue :
https://github.com/boto/botocore/issues/2187
What is the Model url?print(model.url)
How does a task specify which docker image it needs?
Either in the code itself 'task.set_base_docker' or with the CLI, or set it in the UI when you clone an experiment (everything becomes editable)
Hi PompousBeetle71 I'm with SteadyFox10 on this one. Unless you choose a file name based on epoch or step , you are literally overwriting the model file, which Trains will reflect. If you use epoch in the filename you will end up with all your models logged by Trains. BTW we are actively working on integration with pytorch ignite, so if you have any suggestions now is the time :)
from clearml.backend_api.session.client import APIClient c = APIClient() c.projects.update(project="project-id-here", system_tags=[])
My task starts up and checks the mounted EFS volume for x data, if x data does not exist there, it then pulls x data from S3.
BoredHedgehog47 you can just use StorageManager and configure clearml cache for the EFS, it will essentially do the same π
Regrading helm chart with EFS,
you need to configure the clearml-glue pod template with the EFS mount
example :
https://github.com/kubernetes-sigs/aws-efs-csi-driver/blob/e7f647f4e6fc76f983d61522e635353005f1472f/examples/kubernetes/volu...
Oh what if the script is in the container already?
Hmm, the idea of clearml is that the container is a "base environment" and code is "injected", this makes sure it is easy to reuse it.
The easiest way is to add an "entry point" scripts that just calls the existing script inside the container.
You can have this python initial script on your local machine then when you call clearml-task it will upload the local "entry point" script directly to the Task, and then on the remote machin...
Can I delete logs from existing experiments on the ClearML server?
Only by resetting the Task (which would delete everything), or deleting the Task iteself.
You can also disable the auto console log, and report manually ?
Task.init(..., auto_connect_streams=False)
Should have worked, the error you are getting is docker-compose parsing the yml file
Is this exactly the one from the trains-server repo ?
Hi @<1661180197757521920:profile|GiddyShrimp15>
I think the is a better channel for this kind of question
(they will be able to help with that)
BTW:
Just making sure, 74 was not supposed to be the last checkpoint (in other words it is not stuck on leaving the training process, but actually in the middle)
GiganticTurtle0 we had this discussion in the wrong thread, I moved it here.
Moved from the wrong thread
Martin.B Β Β [1:55 PM]
GiganticTurtle0 Β the sample mock pipeline seems to be running perfectly on the latest code from GitHub, can you verify ?
Martin.B Β Β [1:55 PM]
Spoke too soon, sorryΒ π Β issue is reproducible, give me a minute here
Alejandro C Β Β [1:59 PM]
Oh, and which approach do you suggest to achieve the same goal (simultaneously running the same pipeline with differen...
add_external_files
with a very large number of urls that are
not
in the same S3 folder without running into a usage limit due to the
state.json
file being updated
a lot
?
Hi ShortElephant92
what do you mean the state.json is updated a lot?
I think that everytime you call add_external_files is updated, but add_external_files ` can get a folder to scan, that would be more efficient. How are you using it ?
My question is if there is an easy way to track gradients similar to
wandb.watch
@<1523705099182936064:profile|GrievingDeer61> not at the moment, but should be fairly easy to add.
Usually torch examples just use TB as a default logging, which would go directly to clearml , but this is a great idea to add
Could probably go straight to the next version π
wdyt?
btw: you can also configure --extra-index-url in the agent's clearml.conf
I can install pytorch just fine locally on the agent, when I do not use clearml(-agent)
My thinking is the issue might be on the env file we are passing to conda, I can't find any other diff.
BTW:
@<1523701868901961728:profile|ReassuredTiger98> Can I send a specific wheel with mode debug prints for you to check (basically it will print the conda env YAML it is using)?
do I need to create a brand new dataset with a new name that inherits from the original?
Yes, you just create a new version, specify the parent one, add changes and close it.
If you later need you can squash a version (same ides as git squash). Make sense ?
is this a config file on your side or something I can change, if we had enterprise version?
Yes, this is one of the things you can configure
Hi @<1593413673383104512:profile|MiniatureDragonfly17>
These are the specific model input/output layers name.
The way Triton analyses PyTorch model is usuallyinput__0 then input__1 for the input layers and output__0 and so on for the results:
You can see an example here:
None
--input-size 1 28 28 --input-name "INPUT__0" --input-type float32 --output-size -1 10 --output-name "OUTPUT__0" --outpu...
EnthusiasticCoyote30 you can register an existing Model with:from clearml import InputModel model = InputModel.import_model(weights_url=" "...)
Hi SuperficialGrasshopper36
You are diffidently onto a bug π
It seems that with the new poetry , we fail to set the target venv (basically it decides for itself), from that point, the execution f the actual code is not running inside the correct venv.
Could you please open a GitHub issue?
I want to make sure this will be addressed π
Hi JitteryCoyote63 ,
upload_artifacts was designed to upload pre made artifacts, which actually covers everything.
With register_artifacts we tried to have something that will constantly log PD artifact, the use case was examples used for training and their order, so we could compare the execution of two different experiments and detect dataset contamination etc.
Not Sure it is actually useful though ...
Retrieving an artifact from a Task is done by:
` Task.get_task(task_id='aaa').artifact...
and I've made a script to edit it to our needs as part of the installation processΒ
Β Thanks Martin!
My pleasure, btw: there is no actual need to configure all the clearml.conf values. It will actually take the defaults from the clearml package itself. This means you only need something like:
` api {
server config here
}
sdk.aws.s3{
minio config here
} `