or, if you want the steps to be ran by the agent, set run_pipeline_steps_locally=False
Hi NonchalantGiraffe17 ! Thanks for reporting this. It would be easier for us to check if there is something wrong with ClearML if we knew the number and sizes of the files you are trying to upload (content is not relevant). Could you maybe provide those?
Perfect! Can you please provide the sizes of the files of the other 2 chunks as well?
@<1523721697604145152:profile|YummyWhale40> are you able to manually save models from SageMaker using OutputModel
? None
@<1590514584836378624:profile|AmiableSeaturtle81> weren't you using https for the s3 host? maybe the issue has something to do with that?
Can you actually add the bucket to the credentials just to try it out?
Also, can you check that this snippet works for you (with your creds):
import boto3
import json
import six
key = ""
secret = ""
host = "our_host.com"
bucket_name = "bucket"
profile = None
filename = "test"
data = {"test": "data"}
boto_session = boto3.Session(aws_access_key_id=key, aws_secret_access_key=secret, profile_name=profile)
endpoint = "https://" + host
boto_resource = boto_session.resource("s3", region_name...
Hi @<1578555761724755968:profile|GrievingKoala83> ! We have released clearml==1.16.3rc1
which should solve the issue now. Just specify task.launch_multi_node(nodes, devices=gpus)
. For example:
import sys
import os
from argparse import ArgumentParser
import pytorch_lightning as pl
from pytorch_lightning.strategies.ddp import DDPStrategy
import torch
from torch.nn import functional as F
from torch.utils.data import DataLoader, random_split
from torchvision import transforms
from...
You should alter the name (or else the model will be overwritten)
Hi @<1576381444509405184:profile|ManiacalLizard2> ! Can you please share a code snippet that I could run to investigate the issue?
Hi @<1590514584836378624:profile|AmiableSeaturtle81> ! What function are you using to upload the data?
Hi DeliciousKoala34 . I was able to reproduce your issue. I'm now looking for a solution for your problem. Thank you
FierceHamster54 As long as you are not forking, you need to use Task.init
such that the libraries you are using get patched in the child process. You don't need to specify the project_name
, task_name
or outpur_uri
. You could try locally as well with a minimal example to check that everything works after calling Task.init
.
basically, I think that the pipeline run starts from __
main_
_
and not the pipeline function, which causes the file to be read
@<1668427963986612224:profile|GracefulCoral77> You can both create a child or keep the same dataset as long as it is not finalized.
You can skip the finalization using the --skip-close
argument. Anyhow, I can see why the current workflow is confusing. I will discuss it with the team, maybe we should allow syncing unfinalized datasets as well.
Hi @<1668427963986612224:profile|GracefulCoral77> ! The error is a bit misleading. What it actually means is that you shouldn't attempt to modify a finalized clearml dataset (I suppose that is what you are trying to achieve). Instead, you should create a new dataset that inherits from the finalized one and sync that dataset, or leave the dataset in an unfinalized state
Hi @<1545216070686609408:profile|EnthusiasticCow4> ! This is actually very weird. Does your pipeline fail when running the first step? What if you run the pipeline via "raw" python (i.e. by doing python3 your_script.py
)?
Hi @<1545216070686609408:profile|EnthusiasticCow4> ! Can you please try with clearml==1.13.3rc0
? I believe we fixed this issue
Hi @<1702492411105644544:profile|YummyGrasshopper29> ! To enable caching while using a repo
, you also need to specify a commit
(as the repo might change which would invalidate the caching). We will add a warning regarding this in the near future.
Regarding the imports: we are aware that there are some problems when executing the pipeline remotely as described. At the moment, appending to sys.path is one of the only solutions (other than making utils a package on your local machine so...
Hi ShortElephant92 ! Random images, audio files, tables (trimmed to a few rows) are sent as Debug Samples for preview. By default, they are sent to our servers. Check this function if you wish to log the samples to another destination https://clear.ml/docs/latest/docs/references/sdk/logger/#set_default_upload_destination .
You could also add these entries in you clearml.conf
to not send any samples for preview:
` sdk.dataset.preview.tabular.table_count: 0
sdk.dataset.preview.media.i...
Regarding number 2.
, that is indeed a bug and we will try to fix it as soon as possible
ClearML does not officially support a remotely executed task to spawn more tasks
we do through pipelines, it that helps you somehow. Note that doing things the way you do them right now might break some other functionality.
Anyway, I will talk with the team and maybe change this behaviour because it should be easy 👍
@<1523703472304689152:profile|UpsetTurkey67> It would be great if you could write a script we could use to reproduce
Hi @<1594863230964994048:profile|DangerousBee35> ! This GH issue might be relevant to you: None
That is a clear bug to me. Can you please open a GH issue?
@<1523701083040387072:profile|UnevenDolphin73> are you composing the code you want to execute remotely by copy pasting it from various cells in one standalone cell?
@<1523703472304689152:profile|UpsetTurkey67> great, thank you! We are taking a look
UnevenDolphin73 looks like we clear all loggers when a task is closed, not just clearml ones. this is the problem
@<1523701240951738368:profile|RoundMosquito25> sorry, actually add_pipeline_tags
will add the tag pipe: ID
to all steps, not a predefined tag. You will need to set the tags
argument to your desired tags for each step individually
1.10.2 should be old enough