Reputation
Badges 1
25 × Eureka!Hi SubstantialElk6
but in terms of data provenance, its not clear how i can associate the data versions with the processes that created it.
I think DeliciousBluewhale87 ’s approach is what we are aiming for, but with code.
So using clearml-data
from CLI is basically storing/versioning of files (with differentiable based storage etc, but still).
What ou are after (I think) is in your preprocessing code using the programtic Dataset class, to create the Dataset from code, this a...
Hi ExuberantParrot61 the odd thing is this, message
No repository found, storing script code instead
when you are actually running from inside the repo... (
is it saying that on a specific step, or is it on the pipeline logic itself?
Also any chance you can share the full console output ?
BTW:
you can manually specify a repo branch for a step:
https://github.com/allegroai/clearml/blob/a492ee50fbf78d5ae07b603445f4983feb9da8df/clearml/automation/controller.py#L2841
Example:
https:/...
Oh, is your pipeline code a part of a git repository ?
Just to clarify, where do I run the second command?
Anywhere just open a python console and import the offline task:from trains import TaskTask.import_offline_session('./my_task_aaa.zip')
Related, how to I specify in my code the cache_dir where the zip is saved?
This is the Trains cache folder, you can set it in the trains.conf file:
https://github.com/allegroai/trains/blob/10ec4d56fb4a1f933128b35d68c727189310aae8/docs/trains.conf#L24
additionally, I found is that clearml==1.0.5 package is able to find these partial changes, newer versions find nothing at all, maybe it's because it's always comparing against remote
Hmm it was always from remote...
it is actually doing the following:git rev-parse --abbrev-ref --symbolic-full-name @{u}
Then with the branch name output,git diff --submodule=diff <add_branch_name_here>
But you can get that directly, Task.get_task(...).artifacts[name].url , no? Am I missing something?
Hi DilapidatedDucks58
apologies, this thread slipped way.
I double checked, there server will not allow you to overwrite it (meaning to have it fixed will need to release a server version which usually takes longer)
That said maybe we can pass an argument to the "Task.init" so it ignores it? wdyt?
SmallDeer34 No worries, I'm happy to hear the issue disappeared 🙂
Switching to process Pool might be a bit of an overkill here (I think)
wdyt?
Hi IrritableOwl63
Yes this seems like a docker setup issue 🙂
either run the agent with sudo (not really recommended 😉 ) or add to suduers :
https://docs.docker.com/engine/install/linux-postinstall/
This seems more complicated that I thought... I think you are correct, and it fails to load the entire module, let me check what I can do
And can you see your promethues in your grafana?
task.connect(model_config)
task.connect(DataAugConfig)
If these are separate dictionaries , you should probably use two sections:
task.connect(model_config, name="model config")
task.connect(DataAugConfig, name="data aug")
It is still getting stuck.
I notice that one of the scalars that gets logged early is logging the epoch while the remaining scalars seem to be iterations because the iteration value is 1355 instead of 26
wait so you are seeing Some scalars ?...
Hi @<1730758665054457856:profile|MysteriousCrab4>
do I get to have the autoscaler feature,
You have the open source one here: None
In the managed Pro tier you have the fancy UI AWS/GCP autosclaer (with some additional extra features)
And there is the Scale/Enterprise tiers with more sophisticated features like Vault on top of that
I see it's a plotly plot, even though I report a matplotlib one
ClearML tries to convert matplotlib into plotly objects so they are interactive, it it fails it falls back into a static image as in matplotlib
I will probably just use everywhere an absolute path to be robust against different machine user accounts: /home/user/trains.conf
That sounds like good practice
Other than the wrong, trains.conf, I can't think of anything else... Well maybe if you have AWS environment variables with credentials ? they will override the conf file
Hmm, I think I need more to try and reproduce, what exactly did you do, what was the expected behavior vs reality ?
What's the exact error you are getting ?
(Maybe this is privilege error on the cache folder, what are the folders it is using, you can see in the configuration as well)
GiganticTurtle0 this one worked for me 🙂
` from clearml import Task
from clearml.automation.controller import PipelineDecorator
@PipelineDecorator.component(return_values=["msg"], execution_queue="myqueue1")
def step_1(msg: str):
msg += "\nI've survived step 1!"
return msg
@PipelineDecorator.component(return_values=["msg"], execution_queue="myqueue2")
def step_2(msg: str):
msg += "\nI've also survived step 2!"
return msg
@PipelineDecorator.component(return_values=["m...
I see, is this what you are looking for?
https://allegro.ai/docs/task.html#trains.task.Task.init
continue_last_task='task_id'
Sure, you can pass ${stage_data.id}
as argument and the actual Task will get the reference step's Task ID of the current execution.
make sense ?
Is there any contingency plan for an agent to continue running a task without reading the repository on the GitLab server?
Not sure what can be done ... any suggestions ?
At runtime, can I ask the agent to use some cached repository?
sometimes you will have it (as the agent stores a cached copy, but I would hardly count on it (and it might be at different states on different machines...)
... (due to regular maintenance service, something I cannot control).
Maybe let "th...
For example, opening a project or experiment page might take half a minute.
This implies mongodb performance issue
What's the size of the mongo DB?
but who exactly executes agent in this case?
with both execute
/ build
commands, you execute it on your machine, for debugging purposes. make sense ?
thought the agent created a new conda env and installed all packages
It does, but I was asking what is written on the Original Task (the one created when you executed the code on your laptop, not when the agent was executing it, when the agent is executing the Task, it writes back All the packages of the entire venv it created, when the Task is run manually, it will list only the packages you import directly (i.e. from package or import package, it actually analyses the code)
My point...
Oh no, you are absolutely correct, it is broken (I mean I have no idea why it lists Hydra, or how it got there). I will let the guys know and fix it.
Bottom line, after you clone it, please edit the installed packages and remove the "Hydra" line and replace with just "hydra-core" (no need for version).
The format is the same as "requirements.txt" and will effect the venv created by the agent
UnevenDolphin73fatal: could not read Username for '
': terminal prompts disabled .. fatal: clone of '
' into submodule path '/root/.clearml/vcs-cache/xxx.60db3666b11ac2df511a851e269817ef/xxx/xxx' failed
It seems it tries to clone a submodule and fails due to to missing keys for the submodule.
https://stackoverflow.com/questions/7714326/git-submodule-url-not-including-username
wdyt?
... if we have direct access to the Kubernetes worker when we run K8S glue?
Correct, if you have a direct access to the Node (on your k8s cluster) from your laptop (assuming the clearml-session is running from the laptop), everything should work
Hi @<1523701323046850560:profile|OutrageousSheep60>
What do you mean by "in clearml server" ? I do not see any reason a subprocess call from a Task will be an issue. What am I missing ?