Reputation
Badges 1
35 × Eureka!Hi @<1523701087100473344:profile|SuccessfulKoala55> , I think the issue is where to put the connect_configuration call. I can't put it inside run_pipeline because it's only running remotely and it doesn't have access to the file, and I can't put it in the script before the call to run_pipeline since the task has not been initialized yet.
I've upladed an example here for simiplicity: None
Also: what's the purpose of storing the pipeline arguments as artifacts then? When it runs remotely it still runs the main script as entrypoint and not the pipeline function directly, so all the arguments will be replaced by what is passed to the function during the remote execution, right?
@<1523701435869433856:profile|SmugDolphin23> then the issue is that config is not set. I also tried with:
import yaml
import argparse
from my_pipeline.pipeline import run_pipeline
from clearml import Task
parser = argparse.ArgumentParser()
parser.add_argument('--config', type=str, required=True)
if __name__ == '__main__':
if Task.running_locally()::
args = parser.parse_args()
with open(args.config) as f:
config = yaml.load(f, yaml.FullLoader)
else:
...
Hi @<1523701070390366208:profile|CostlyOstrich36> , sorry how would you use them exactly?
In the meantime, any suggestion on how to set the working_dir in any other way? We are moving to this new code structure and I’d like to have clearml up and running
Is there any other way to specify it besides directly in the component?
Hi @<1523701205467926528:profile|AgitatedDove14> , in my case all the code is in a subfolder, like projects/main , so if I run from the git root it can’t find the local modules
Yes, or even just something like task.get_size()
For instance, I have in my_pipeline/__main__.py :
import yaml
import argparse
from my_pipeline.pipeline import run_pipeline
parser = argparse.ArgumentParser()
parser.add_argument('--config', type=str, required=True)
if __name__ == '__main__':
args = parser.parse_args()
with open(args.config) as f:
config = yaml.load(f, yaml.FullLoader)
run_pipeline(config)
and in my_pipeline/pipeline.py :
@PipelineDecorator.pipeline(
name='Main',
project=...
Thanks @<1523701087100473344:profile|SuccessfulKoala55> , I’ll take a look
So the longest experiments I have takes ~800KB in logs. I have tens of plotly plots logged manually, how are they stored internally? I tried to export them to json and they don't take more than 50KB each, but maybe they take more memory internally?
Hi @<1523701205467926528:profile|AgitatedDove14> , I already tried to check manually in the web UI for some anomalous file, i.e. by downloading the log files or exporting the metrics plots, but I couldn't find anything that takes more than 100KB, and I'm already at 300MB of usage with just 15 tasks. It's not possible to get more info using some python APIs?
Hi @<1523701070390366208:profile|CostlyOstrich36> , thanks but in this case I’d like to get also the ids of the running workers, so that I can selectively stop some of them. Is it possible somehow?
So the issue is that I would like too keep the list of hyperparams and metrics, if I clean them up then I would lose them. But I agree that I might be overthinking it
Oh nice thanks, will try with that combination
Basically I want to run a function in parallel, and having that function create multiple tasks. So I was thinking of setting up a pipeline to have this hierarchy main -> parallelized_function -> init_task_function . But I guess I could also just call Task.create in init_task_function and achieve the same
Hi @<1523701435869433856:profile|SmugDolphin23> , I just tried it but Task.current_task() returns None even when running in remotely
Would just having some python API be an option? It would be more than enough to check what is causing this, and it would be called infrequently
Hi @<1523701070390366208:profile|CostlyOstrich36> , yes it's specifically with datasets. Probably the option I need is size.max_used_bytes but it looks like it's available only for the enterprise plan? Is there any other way to clean the cache after each task?
Hi @<1523701087100473344:profile|SuccessfulKoala55> , thanks, and how can I get the "id" to use with update for the dataset folder case?
Hi @<1523701087100473344:profile|SuccessfulKoala55> , is there any workaround?
This would work to load the local modules, but I’m also using poetry and the pyproject.toml is in the subdirectory, so the agent won’t install any dependency if I don’t set the work_dir
Hi @<1523701087100473344:profile|SuccessfulKoala55> , I'm uploading some debug images by they are around 300KB each, and less than 10 per experiment. Also, aren't debug images counted as artifacts for the quota?
Yes I can read it using this. I was just wondering if there is a way to read the file downloaded directly from the UI
I have some git diffs logged but they are very small. For the configurations I saw that the datasets tasks have a fairly large "Dataset Content" config (~2MB), but I only have 5 dataset tasks
What is not clear to me is how you would use the callbacks to run the step locally. Are there some properties that needs to be set in the task? I see that there is a start_controller_locally option for the main @PipelineDecorator.pipeline , but I don't see it for @PipelineDecorator.component
Yes these are the only actions. The task is moved correctly tho, I can see it under f'{config.project_id}/.pipelines' in the UI, the issue is that it's not visible under PIPELINES . I haven't tried with tasks or fiunctions pipelines yet.
Hi, yes I’m using the same clearml.conf on the agent, in the logs I can see that console_cr_flush_period is set to 30