
Reputation
Badges 1
25 × Eureka!Ad1. yes, think this is kind of bug. Using _task to get pipeline input values is a little bit ugly
Good point, let;s fix it 🙂
new pipeline is built from scratch (all steps etc), but by clicking "NEW RUN" in GUI it just reuse existing pipeline. Is it correct?
Oh I think I understand what happens, the way the pipeline logic is built, is that the "DAG" is created the first time the code runs, then when you re-run the pipeline step it serializes the DAG from the Task/backend.
Th...
GiganticTurtle0 this one worked for me 🙂
` from clearml import Task
from clearml.automation.controller import PipelineDecorator
@PipelineDecorator.component(return_values=["msg"], execution_queue="myqueue1")
def step_1(msg: str):
msg += "\nI've survived step 1!"
return msg
@PipelineDecorator.component(return_values=["msg"], execution_queue="myqueue2")
def step_2(msg: str):
msg += "\nI've also survived step 2!"
return msg
@PipelineDecorator.component(return_values=["m...
This doesn't seem to be running inside a container...
What's the clearml-agent launch command you are using ? (i.e. do you have --docker flag)
Maybe we should rename it?! it actually creates a Task but will not auto connect it...
Hmm is this similar to this one https://allegroai-trains.slack.com/archives/CTK20V944/p1597845996171600?thread_ts=1597845996.171600&cid=CTK20V944
StaleButterfly40 just making sure I understand, are we trying to solve the "import offline zip file/folder" issue, where we create multiple Tasks (i.e. Task per import)? Or are you suggesting the Actual task (the one running in offline mode) needs support for continue-previous execution ?
you need to set
CLEARML_DEFAULT_BASE_SERVE_URL:
So it knows how to access itself
Since pytorch is a special example (the agent will pick the correct pytorch based on the installed CUDA) , the agent will first make sure the file is downloaded, and then pass the resolving for pip to decide if it necessary to install. (bottom line, we downloaded the torch for no reason but it is cached so no real harm done) It might be the second package needs a specific numpy version... this resolving is don't by pip, not the agent specifically. Anyhow --system-site-packages is applicable o...
When we enqueue the task using the web-ui we have the above error
ShallowGoldfish8 I think I understand the issue,
basically I think the issue is:task.connect(model_params, 'model_params')
Since this is a nested dict:model_params = { "loss_function": "Logloss", "eval_metric": "AUC", "class_weights": {0: 1, 1: 60}, "learning_rate": 0.1 }
The class_weights is stored as a String key, but catboost expects "int" key, hence it fails.
One op...
How can I reproduce it?
I still don't get resource logging when I run in an agent.
@<1533620191232004096:profile|NuttyLobster9> there should be no difference ... are we still talking about <30 sec? or a sleep test? (no resource logging at all?)
have a separate task that is logging metrics with tensorboard. When running locally, I see the metrics appear in the "scalars" tab in ClearML, but when running in an agent, nothing. Any suggestions on where to look?
This is odd and somewhat consistent with actu...
I'm assuming these are the Only packages that are imported directly (i.e. pandas requires other packages but the code imports pandas so this is what listed).
The way ClearML detect packages, it first tries to understand if this is a "standalone" scrip, if it does, than only imports in the main script are logged. Then if it "thinks" this is not a standalone script, then it will analyze the entire repository.
make sense ?
Yes, offline got broken in 1.3.0 😞 , RC fixed it:pip install clearml==1.3.1rc0
Stable release later this week
This looks like 'feast' error, could it be a configuration missing?
Ohhh I see, yes this is regexp matching, if you want the exact match:'^{}$'.format(name)
Hi CheerfulGorilla72
see
Notice all posts on that channel are @ channel 🙂
Hi JitteryCoyote63
Is this close ?
https://github.com/allegroai/clearml/issues/283
VexedCat68
. So the checkpoints just added up. I've stopped the training for now. I need to delete all of those checkpoints before I start training again.
Are you uploading the checkpoints manually with artifacts? or is it autologged & uploaded ?
Also why no reuse and overwrite older checkpoints ?
You can always log it manually:from clearml import InputModel input_model = InputModel.import_model(weights_url='/tmp/keras_example/weight.6.hdf5')
Hi @<1546665638829756416:profile|LovelyChimpanzee39>
anyone know what params I need to pass in order to enable it?
we feel you there 😞 this is an actual plotly feature that we really want to add, but kind of out of our hands: None
feel free to support as there 🤞
I'm not sure I'm the right person to answer that, but yes my understanding is that this is a Scale/Enterprise tier feature, at least for the time being.
Hi RoundMosquito25
however they are not visible either in:
But can you see them in the UI?
So how do I solve the problem? Should I just relaunch the agents? Because they can't execute jobs now
Are you running in docker mode ?
If so you can actually delete mapped files (they will still be available inside the docker), just make sure you delete them X hours after they were created, and you should be fine.
wdyt?
Maybe that's the issue :
https://github.com/googleapis/python-storage/issues/74#issuecomment-602487082
- Yes the challenge is mostly around defining the interface. Regarding packaging, I'm thinking a similar approach to the pipeline decorator, wdyt?
- Clearml agents will be running on k8s, but the main caveat is that I cannot think of a way to help with the deployment, at the end it will be kubectl that users will have to call in order to spin the containers with the agents, maybe a simple CLI to do that for you?