Hi DeterminedToad86
I just verified on a clean sagemaker instance everything should just work, see here: https://demoapp.demo.clear.ml/projects/0e919ea1cc5c499b99e1ab85004b6e97/experiments/887edef09d4549e88b829a34c87d4d5b/output/execution Yes if you have more than one file (either notebook or python script) than you must have a git repo, in order to run the task using the Agent.
@<1710827340621156352:profile|HungryFrog27> the venv-build folder is supposed to be deleted after each task is done. How did you end up with leftovers? Could it be windows was failing to delete it for some reason? That actually connects with you initial issue no?
Hi RotundHedgehog76
we have issues with
clearml-agent
when using standalone mode. ...
What is the use case for standalone mode? is this venv or docker mode?
Please go ahead with the PR 🙂
I think the part that is missing for me is the context, in other words how would one configure the execution_plan and why would they configure it in a specific way?
My intuition, without fully understanding it, is that for some reason the internal DAG/decision is exposed to the user, and it feels like too much information. Basically I have a hunch that the users should not need to have such deep understanding to control the flow, and they should end up with an abstraction on top of it. ...
are you referring to the same line? 47 in cache.py?
Yes, that makes sense. But did you see the callback being executed ? it seems it was supposed to, then the next call would have been 2:30 hours later, am I missing something ?
I just tested the master with https://github.com/jkhenning/ignite/blob/fix_trains_checkpoint_n_saved/examples/contrib/mnist/mnist_with_trains_logger.py on the latest ignite master and Trains, it passed, but so did the previous commit...
Hi @<1618056041293942784:profile|GaudySnake67>Task.create is designed to create an External task not from the current running process.Task.init is for creating a Task from your current code, and this is why you have all the auto_connect parameters. Does that make sense ?
CooperativeFox72
Could you try to run the docker and then inside the docker try to do:su root whoami
No, it is zipped and stored, so in order to open the zipfile and read the files you have to download them.
That said everything is cached, so if the machine already downloaded the dataset there is zero download / unzipping,
make sese?
DilapidatedDucks58 Nice!
but it would be great to see predecessors of each experiment in the chain
So maybe we should add "manual pipeline" to create the connection post execution ? is this a one time thing ?
Maybe a service creating these flow charts ?
Should we put them in the Project's readme ? Or in the Pipeline section (coming soon)
But a warning instead of an error would be good.
Yes, that makes sense, I'll make sure we do that
Does this sound like a reasonable workflow, or is there a better way maybe?
makes total sense to me, will be part of next RC 🙂
Need - in my CI, the url used is https but I need the ssh url to be used. I see that we can pass repo to Task.create but not Task.init
Are you cloning an existing Task, or creating a new one ?
AdventurousButterfly15 this one is quite self container:
https://github.com/allegroai/clearml/blob/master/examples/reporting/scalar_reporting.py
So I guess pip install finished working
But the task is evidently not being executed.
This is very odd ... you can run the agent with debugging with --debug --foreground to see all the outputs and logs
GiganticTurtle0 adding --stop to the exact daemon execution will stop it (meaning if you have multiple agents on the same machine launched with different parameters, just add the --stop to retire the specific one)
GreasyPenguin14 we never had troubles with Task.init (or any other clearml calls) and working with the pycharm debugger, we use it quite extensively ...
Actually on a very similar setup...
Could you send the full log?
Or maybe a code snippet to reproduce this behavior ?
(We did notice they fixed a few issues with the debugger in 2020.3.3 so it's worth upgrading)
DilapidatedDucks58
all our workers went down after starting the slack bot, is it expected?)
Oh dear... I can;t see any connection... What is the last log you have there?
Could it be someone deleted the file? this is inside the temp venv folder but it should not get there
BTW: What's the TF / Keras version?
ZanyPig66 it sounds like you need to add the docker args for binding, just add to the Task.create the argument: 'docker_args="-v /mnt/host:/mnt/container"'
MelancholyElk85 assuming we are running with clearml 1.1.1 , let's debug the pipeline and instead of pipeline start/wait/stop :
Let's do:pipeline.start_locally(run_pipeline_steps_locally=False)
Hi ReassuredTiger98
To separate between minio and S3 we use:
s3://bucket/file for AWS S3 service and s3://server :port/bucket/file for minio.
this means if your S3 links would have been s3://<minio-address>:<port>/bucket/file.bin the UI would have popped the cred window.
Make sense ?
Hi @<1523703397830627328:profile|CrookedMonkey33>
If you click on the "Task Information" (on the Version Info panel, right hand-side). It will open the Task details page, there you have the "hamburger" menu top right, where you have publish
(Maybe we should add that to the main right click menu?!)
i have it deployed successfully with istio.
Nice!
the only thing we had to do to get it to work was to modify the nginx.conf in the webserver pod to allow http 1.1
I was under the impression we fixed that, let me check
Well (yes, I think), the environment section is used mostly for logging, the next version will have full support by the clearml-agent (due next week), and the next release of clearml-server will add basj-script support.
Task.create will create a new Task (and return an object) but it does not do any auto-magic (like logging the console, tensorboard etc.)