
Reputation
Badges 1
25 × Eureka!Hmm that is odd.
Can you verify with the latest from GitHub?
Is this reproducible with the pipeline example code?
I guess it wonβt due to the nature of services?
Correct, k8s glue works differently, that said I would actually use the helm to spin a pod woth the agent in services mode and venv mode.
LittleShrimp86 can you post the full log of the pipeline? (something is odd here)
or at least stick to the requirements.txt file rather than the actual environment
You can also for it to log the requirements.txt withTask.force_requirements_env_freeze(requirements_file="requirements.txt") task = Task.init(...)
Seems that api has changed pretty much since a few versions back.
Correct, notice that your old pipelines Tasks use the older package and will still work.
There seems to be no need inΒ
controller_task
Β anymore, right?
Correct, you can just call pipeline.start()
π
The pipeline creates the tasks, but never executes/enqueues them (they are all inΒ
Draft
Β mode). No DAG graph appears inΒ
RESULTS/PLOTS
Β tab.
Which vers...
SillyPuppy19 I think this is a great idea, basically having the ability to have a callback function called before aborting/exiting the process.
Unfortunately today abort will give the process 2 seconds to gracefully quit and then it kills the process. It was not designed to just send an abort signal, as these will more often than not, will not actually terminate the process.
Any chance I can ask you to open a GitHub Issue and suggest the callback feature. I have a feeling a few more users ...
ElegantKangaroo44 it seems to work here?!
https://demoapp.trains.allegro.ai/projects/0e152d03acf94ae4bb1f3787e293a9f5/experiments/48907bb6e870479f8b230e6b564cd52e/output/metrics/plots
save off the "best" model instead of the last
Should be relatively easy to update on the main Task the model with the best performance, no?
Hi @<1668427971179843584:profile|GrumpySeahorse51>
Could you provide the full stack log?
this erros seems to originate from psutil (which is used) but it lacks the clearml-session context
Sorry, on the remote machine (i.e. enqueue it and let the agent run it), this will also log the print π
can you bump me to that thread?
https://clearml.slack.com/archives/CTK20V944/p1630610430171200
I realise I'll need to catalogue all the dataset ids created by ppl separately on a spreadsheet.
Okay this part I missed, why would you need to add additional "catalog" when you have the UI?
DilapidatedDucks58 Nice!
but it would be great to see predecessors of each experiment in the chain
So maybe we should add "manual pipeline" to create the connection post execution ? is this a one time thing ?
Maybe a service creating these flow charts ?
Should we put them in the Project's readme ? Or in the Pipeline section (coming soon)
MelancholyElk85
After I set base docker for pipeline controller task, I cannot clone the repo...
What do you mean by that?
Also, how do you set the PipelineController base_docker_image (I'm assuming the is needed to run the pipeline logic?!, is that correct?)
One last thing make sure you spin the pod container with privileged mode, because the trains-agent docker will spin a sibling docker for your actual experiment.
In the UI you can see all the agents and their IDs
Then you can so
clearml-agent daemon --stop <agent id>
Hi GreasyPenguin14
- Did using auto_connect_frameworks={'pytorch': False} solved the issue? ( I imagine it did )
- Maybe we should have the option to have wildcard support so I will only auto log based on filename. Basically using auto_connect_frameworks={'pytorch': "model*.pt"} will only auto log the model files saved/logged , wdyt?
Hi UptightMouse31
First, thank you π
And to your question:
variable in the project is the kpi,
You mean like add it to the experiment table and get kind of leader-board ?
WickedGoat98
for such pods instantiating additional workers listening on queues
I would recommend to create a "devops" user and have its credentials spread across all agents. sounds good?
EDIT:
There is no limit on number of users on the system, so login as a new one and create credentials in the "profile" page :)
Hi @<1541954607595393024:profile|BattyCrocodile47>
is this on your self hosted machine ?
ConvolutedChicken69
, does it take the agent off the queue? does it know it's not available to take tasks?
You mean will it "release" the GPU? (i.e. the agent will pull another Task) ?
If so, then no it will not, an "Interactive Session" session is (from the agent's perspective) a Task that will end sometime, and it will continue to monitor and run it, until you manually close it. The idea is that you are actually using the GPU, hence no on else can run a job on it.
To shut it down, ...
If I checkout/download dataset D on a new machine, it will have to download/extract 15GB worth of data instead of 3GB, right? At least I cannot imagine how you would extract the 3GB of individual files out of zip archives on S3.
Yes, I'm not sure there is an interface to extract only partial files from the zip (although worth checking).
I also remember there is a GitHub issue with uploading 50GB dataset, and the bottom line is, we should support setting chuck size, so that we can uploa...
It is http btw, i don't know why it logged https://
This is odd could it be it automatically forwards to https ?
I would try the certificate check thing first
DeliciousBluewhale87 Is it ML or DL serving you are after ?
Hi RotundSquirrel78
Could those be the example experiments ?
Are you running your own server, is it the saas free tier server?
What's the error you are getting ?
(open the browser web developer, see if you get something on the console log)
Hi VexedCat68
Could it be the python version is not the same? (this is the only reason not to find a specific python package version)
Should work in all cases, plotly/matplotlib/scalar_rerport