Not sure getting that, if you are loading the last dataset task in your experiment task code, it should take the most updated one.
change the report of the train_loss?
Are you referring for not sending the train_loss
results?
Hi TenseOstrich47 , the StorageManager does use boto3 for those upload (so if its not supported by boto3, same for StorageManager :/ )
Maybe you can use the 'wait_for_upload' and delete the local file after?
trying to understand what reset the task
Hi JitteryCoyote63
You don’t need to run in from the Trains Server machine, you just need ~/trains.conf
file with configuration to your Trains Server
Hi GiddyTurkey39 , it should be released next week, I will update you on this thread once out 🙂
resources configuration, so you have subnet ID and the security group ID and it failed with it?
thanks SmugTurtle78 , checking it
yes, this fix almost done testing and will be part of the next release, will keep you updated about it
Hi WackyRabbit7 , saw you updated the GH issue, will try to reproduce it here
Hi ShinyLobster84 , where do you usually install XXXXX package from? or some artifactory?
nope, this example use local path (for the PIL image)
Hi DefeatedCrab47
If you are referring to this example, examples/frameworks/tensorboardx/pytorch_tensorboardX.py, it does have only test and train steps.
If you like to plot validation together with the train, you can have the same prefix, for example when using writer.add_scalar('<prefix>/Test_Loss', ...)
, like in this example - https://demoapp.trains.allegro.ai/projects/bb21d73db5634e6d837724366397b6e2/experiments/f46160152ee54ff9863bb2efe554a6b1/output/metrics/scalar
try:
dataset = Dataset.create(data_name, project_name) dataset_id = dataset.id
Hi VexedCat68 ,
How do you create it? with Dataset.create
?
Hi TenseOstrich47 ,
Try using aws credentials with region
too https://github.com/allegroai/clearml/blob/master/docs/clearml.conf#L88
credentials: [ specifies key/secret credentials to use when handling s3 urls (read or write) { bucket: "my-bucket-name" key: "my-access-key" secret: "my-secret-key" region: "my-region" },
Hi SmugTurtle78 , can you share you configuration? (without the secrets)
- are you working vpc? did you try configure only one of the params?
Hi JitteryCoyote63 ,
v100 will be used, in the coming feature we will have priorities too, will keep you updated about it
MinuteWalrus85 thanks for the screenshot, asking about TB dashboard to understand where the issue is coming from.
Trains is patching TB stats and showing it to you in the web-app, so if the results are the same in TB dashboard, the reporting of the values can be wrong, if the TB dashboard and the web-app have different results, there can be an issue with web-app reporting
Hi OutrageousSheep60 , I think the connect_configuration is your solution for this one (or connect)
ClearML uses the access and secret for creating the storage object, you can have those as env params too
This seems to be the same issue like in https://clearml.slack.com/archives/CTK20V944/p1633599511350600
Whats the pyjwt
version you are using?
Hi SubstantialElk6 ,
You can configuration S3 credentials on your ~/clearml.conf
file, or with environment variables:os.environ['AWS_ACCESS_KEY_ID'] ="***" os.environ['AWS_SECRET_ACCESS_KEY'] = "***" os.environ['AWS_DEFAULT_REGION'] = "***"
Hi MotionlessMonkey27 ,
first, I’m getting a warning:
ClearML Monitor: Could not detect iteration reporting, falling back to iterations as seconds-from-start
This simply indicated your task did not start reporting metrics to the server yet. Once reporting started, it will go back to iterations-based.
Also, ClearML is not detecting the scalars, which are being logged as follows:
tf.summary.image(‘output’, output_image, step=self._optimizer.iterations.numpy())
or
for key, value in...
Hi UnevenDolphin73 ,
Which region are you using? a machine with or without gpu(s)?
SquareFish25 Will try to reproduce it
Hi FloppyDeer99 ,
In other words, docs introduce that ClearML Open Source supports orchestration, how can I found the relating codes?
You can find many examples https://clear.ml/docs/latest/docs/getting_started/mlops/mlops_first_steps/ , if you have a specific use case you want to check, please share and I can send an example of it.
And what the role of clearml-agent in orchestration, a combination of kube-scheduler and kubelet?
ClearML agent is an ML-Ops tool for users to r...
With this scenario, your data should be updated when running the pipeline
When you run it locally with auto_connect_frameworks={"matplotlib": False}
, did it send the matplotlib
outputs?