WackyRabbit7 just making sure I understand:MedianPredictionCollector.process_results Is called after the pipeline is completed.
Then inside the function, Task.current_task() returns None.
Is this correct?
. Looking at this example here, it looks like it only works with tasks:
Aha! Pipeline is a Task 🙂 (a specific type of Task, nonetheless a Task)
Just use the pipeline ID, and make sure you push it into the services queue, voila
Glad to hear!
(yeah @<1603198134261911552:profile|ColossalReindeer77> I'm with you the override is not intuitive, I'll pass the info to the technical writers, hopefully they can find a way to make it easier to understand)
but somewhere along the way, the request actually remove the header
Where are you seeing the returned value?
Oh sorry:pip install clearml-agent==1.2.0rc4Also automatically detects if you have an active venv inside the container and uses it instead of the system wide python
others from the local environment and this causes a conflict when importing the attr module
Inside the docker ? " local environment" ?
This is all under "root" no?
I see them run reliably (no killed), are they running in service mode?
How do you deploy agents, with the clearml k8s glue ?
the second seems like a botocore issue :
https://github.com/boto/botocore/issues/2187
What is the Model url?print(model.url)
How does a task specify which docker image it needs?
Either in the code itself 'task.set_base_docker' or with the CLI, or set it in the UI when you clone an experiment (everything becomes editable)
Hi PompousBeetle71 I'm with SteadyFox10 on this one. Unless you choose a file name based on epoch or step , you are literally overwriting the model file, which Trains will reflect. If you use epoch in the filename you will end up with all your models logged by Trains. BTW we are actively working on integration with pytorch ignite, so if you have any suggestions now is the time :)
from clearml.backend_api.session.client import APIClient c = APIClient() c.projects.update(project="project-id-here", system_tags=[])
My task starts up and checks the mounted EFS volume for x data, if x data does not exist there, it then pulls x data from S3.
BoredHedgehog47 you can just use StorageManager and configure clearml cache for the EFS, it will essentially do the same 🙂
Regrading helm chart with EFS,
you need to configure the clearml-glue pod template with the EFS mount
example :
https://github.com/kubernetes-sigs/aws-efs-csi-driver/blob/e7f647f4e6fc76f983d61522e635353005f1472f/examples/kubernetes/volu...
Oh what if the script is in the container already?
Hmm, the idea of clearml is that the container is a "base environment" and code is "injected", this makes sure it is easy to reuse it.
The easiest way is to add an "entry point" scripts that just calls the existing script inside the container.
You can have this python initial script on your local machine then when you call clearml-task it will upload the local "entry point" script directly to the Task, and then on the remote machin...
Can I delete logs from existing experiments on the ClearML server?
Only by resetting the Task (which would delete everything), or deleting the Task iteself.
You can also disable the auto console log, and report manually ?
Task.init(..., auto_connect_streams=False)
Should have worked, the error you are getting is docker-compose parsing the yml file
Is this exactly the one from the trains-server repo ?
Hi @<1661180197757521920:profile|GiddyShrimp15>
I think the is a better channel for this kind of question
(they will be able to help with that)
BTW:
Just making sure, 74 was not supposed to be the last checkpoint (in other words it is not stuck on leaving the training process, but actually in the middle)
GiganticTurtle0 we had this discussion in the wrong thread, I moved it here.
Moved from the wrong thread
Martin.B Â Â [1:55 PM]
GiganticTurtle0 Â the sample mock pipeline seems to be running perfectly on the latest code from GitHub, can you verify ?
Martin.B Â Â [1:55 PM]
Spoke too soon, sorry 🙂  issue is reproducible, give me a minute here
Alejandro C Â Â [1:59 PM]
Oh, and which approach do you suggest to achieve the same goal (simultaneously running the same pipeline with differen...
add_external_files
with a very large number of urls that are
not
in the same S3 folder without running into a usage limit due to the
state.json
file being updated
a lot
?
Hi ShortElephant92
what do you mean the state.json is updated a lot?
I think that everytime you call add_external_files is updated, but add_external_files ` can get a folder to scan, that would be more efficient. How are you using it ?
My question is if there is an easy way to track gradients similar to
wandb.watch
@<1523705099182936064:profile|GrievingDeer61> not at the moment, but should be fairly easy to add.
Usually torch examples just use TB as a default logging, which would go directly to clearml , but this is a great idea to add
Could probably go straight to the next version 🙂
wdyt?
btw: you can also configure --extra-index-url in the agent's clearml.conf
I can install pytorch just fine locally on the agent, when I do not use clearml(-agent)
My thinking is the issue might be on the env file we are passing to conda, I can't find any other diff.
BTW:
@<1523701868901961728:profile|ReassuredTiger98> Can I send a specific wheel with mode debug prints for you to check (basically it will print the conda env YAML it is using)?
do I need to create a brand new dataset with a new name that inherits from the original?
Yes, you just create a new version, specify the parent one, add changes and close it.
If you later need you can squash a version (same ides as git squash). Make sense ?