Hi OutrageousSheep60 . The list_datasets function is currently broken and will be fixed next release
what about this script (replace with your creds, comment out creds in clearml.conf for now)
from clearml import Task
from clearml.storage.helper import StorageHelper
task = Task.init("test", "test")
task.setup_aws_upload(
bucket="bucket1",
host="localhost:9000",
key="",
secret="",
profile=None,
secure=True
)
helper = StorageHelper.get("
")
FiercePenguin76 Looks like there is actually a bug when loading models remotely. We will try to fix this asap
Hi @<1523701504827985920:profile|SubstantialElk6> !
Regarding 1: pth files get pickled.
The flow is like this:
- The step is created by the controller by writing some code to a file and running that file in python
- The following line is ran in the step when returning values: None
- This is eventually ran: [None](https://github.com/allegroai/clearml/blob/cbd...
Hi again, @<1526734383564722176:profile|BoredBat47> ! I actually took a closer look at this. The config file should look like this:
s3 {
key: "KEY"
secret: "SECRET"
use_credentials_chain: false
credentials: [
{
host: "myendpoint:443" # no http(s):// and no s3:// prefix, also no bucket name
key: "KEY"
secret: "SECRET"
secure: true # ...
Hi @<1523701868901961728:profile|ReassuredTiger98> ! Looks like the task actually somehow gets ran by both an agent and locally at the same time, so one of the is aborted. Any idea why this might happen?
otherwise, you could run this as a hack:
dataset._dataset_file_entries = {
k: v
for k, v in self._dataset_file_entries.items()
if k not in files_to_remove # you need to define this
}
then call dataset.remove_files with a path that doesn't exist in the dataset.
Hi @<1523707653782507520:profile|MelancholyElk85> ! I don't think this is possible at the moment 😕 Feel free to open a GH issue that proposes this feature tho
Hi HandsomeGiraffe70 ! We found the cause for this problem, we will release a fix ASAP
Perfect! Can you please provide the sizes of the files of the other 2 chunks as well?
Hi @<1532532498972545024:profile|LittleReindeer37> @<1523701205467926528:profile|AgitatedDove14>
I got the session with a bit of "hacking".
See this script:
import boto3, requests, json
from urllib.parse import urlparse
def get_notebook_data():
log_path = "/opt/ml/metadata/resource-metadata.json"
with open(log_path, "r") as logs:
_logs = json.load(logs)
return _logs
notebook_data = get_notebook_data()
client = boto3.client("sagemaker")
response = client.create_...
We used to have "<=20" as the default pip version in the agent. Looks like this default value still exists on your machine. But that version of pip doesn't know how to install your version of pytorch...
Btw, to specify a custom package, add the path to that package to your requirements.txt (the path can also be a github link for example).
Hi @<1546303293918023680:profile|MiniatureRobin9> The PipelineController has a property called id , so just doing something like pipeline.id should be enough
Regarding 1. , are you trying to delete the project from the UI? (I can't see an attached image in your message)
Hi @<1523701304709353472:profile|OddShrimp85> ! Can you please share the logs (make sure to remove any sensitive data, if it exists)
Hi @<1590514584836378624:profile|AmiableSeaturtle81> ! Looks like remove_files doesn't support lists indeed. It does support paths with wildcards tho, if that helps.
I would remove all the files to the dataset and add only the ones you need back as a workaround for now, or just create a new dataset
Hi @<1676038099831885824:profile|BlushingCrocodile88> ! We will soon try to merge a PR submitted via Github that will allow you to specify a list of files to be added to the dataset. So you will then by able to do something like add_files(glob.glob(*) - glob.glob(*.ipynb))
are you running this locally or are you enqueueing the task (controller)?
do you have the agent logs that is supposed to run your pipeline? Maybe there is a clue there. I would also suggest to try enqueuing the pipeline to some other queue, maybe even run the agent on your on machine if you do not already and see what happens
Oh I see what you mean. start will enqueue the pipeline, in order for it to be ran remotely by an agent. I think that what you want to call is pipe.start_locally(run_pipeline_steps_locally=True) (and get rid of the wait ).
Hi PanickyMoth78 ! This will likely not make it into 1.9.0 (this will be the next version we release, most likely before Christmas). We will try to get the fix out in 1.9.1
Hi DangerousDragonfly8 ! Sorry for the late reply. I'm taking a look and will come back to you shortly
Hi @<1590514584836378624:profile|AmiableSeaturtle81> ! Having tqdm installed in your environment might help
Hi @<1523701132025663488:profile|SlimyElephant79> ! Looks like this is a bug on our part. We will fix this as soon as possible
UnevenDolphin73 looks like we clear all loggers when a task is closed, not just clearml ones. this is the problem
Hi @<1546303293918023680:profile|MiniatureRobin9> ! When it comes to pipeline from functions/other tasks, this is not really supported. You could however cut the execution short when your step is being ran by evaluating the return values from other steps.
Note that you should however be able to skip steps if you are using pipeline from decorators
Hi @<1694157594333024256:profile|DisturbedParrot38> ! If you want to override the parameter, you could add a DiscreteParameterRange to hyper_paramters when calling HyperParameterOptimizer . The DiscreteParameterRange should have just 1 value: the value you want to override the parameter with.
You could try setting the parameter to an empty string in order to mark it as cleared
Hi @<1715900760333488128:profile|ScaryShrimp33> ! You can set the log level by setting the CLEARML_LOG_LEVEL env var before importing clearml. For example:
import os
os.environ["CLEARML_LOG_LEVEL"] = "ERROR" # or str(logging.CRITICAL/whatever level) also works
Note that the ClearML Monitor warning is most likely logged to stdout, in which case this message can't really be suppressed, but model upload related message should be