Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
[Task Gets Interrupted / Aborted / Reset When In Offline Mode] For Local Testing, We Have Added A

[Task gets interrupted / aborted / reset when in offline mode]
For local testing, we have added a --no-clearml option to our code that sets task.set_offline(True) directly after the task is created.
However, this results in the process getting interrupted, the outputs show:
2022-11-04 02:31:24,897 - clearml.Task - WARNING - Task d627ee8da785410d91bce2309a4c1b8a was reset! if state is consistent we shall ... SOME OTHER LOGS ... 2022-11-04 02:31:26,899 - clearml.Task - WARNING - Task d627ee8da785410d91bce2309a4c1b8a was reset! if state is consistent we shall terminate. 2022-11-04 02:31:28,900 - clearml.Task - WARNING - Task d627ee8da785410d91bce2309a4c1b8a was reset! if state is consistent we shall terminate. 2022-11-04 02:31:30,902 - clearml.Task - WARNING - ### TASK STOPPED - USER ABORTED - RESET ###All function call to the task are wrapped like if not task.is_offline(): task.XXX() and it does not seem to matter when set_offline() gets set. The same program runs through without setting offline mode of course. Any ideas?

  
  
Posted 2 years ago
Votes Newest

Answers 12


It might be broken for me, as I said the program works without the offline mode but gets interrupted and shows the results from above with offline mode. But there might be another issue in between of course - any idea how to debug?
The environment variable is good to know, I will try with that as well and report back.

  
  
Posted 2 years ago

For local testing, we have added a

ScantChimpanzee51 there is already an environment variable for that, you can just set CLEARML_OFFLINE_MODE 🙂

By the way, if we don’t wrap other calls in

is_offline()

we get errors like “DateTime object is not serializable”, but that’s a secondary issue.

I think this was fixed, can you verify with the latest RC 1.7.3rc0 ? If this still happens can you share the code

However, this results in the process getting interrupted, the outputs show:

Are you saying offlinemode is broken ?

  
  
Posted 2 years ago

It might be broken for me, as I said the program works without the offline mode but gets interrupted and shows the results from above with offline mode.

How could I reproduce this issue ?

But there might be another issue in between of course - any idea how to debug?

I think I missed this one, what exactly is the issue ?

  
  
Posted 2 years ago

BTW, this one seems to work ....
` from time import sleep
from clearml import Task

Task.set_offline(True)
task = Task.init(project_name="debug", task_name="offline test")

print("starting")

for i in range(300):
print(f"{i}")
sleep(1)

print("done") `

  
  
Posted 2 years ago

I meant maybe me activating offline mode, somehow changes something else in the runtime and that in turn leads to the interruption. Let me try to build a minimal reproducible version 🙂

  
  
Posted 2 years ago

Thanks ScantChimpanzee51 !
Let me see what I can find, should be easy enough to fix now 🙂

  
  
Posted 2 years ago

Hi AgitatedDove14 , so it took some time but I’ve finally managed to reproduce. The issue seems to be related to writing images via Tensorboard:
` from torch.utils.tensorboard import SummaryWriter
import torch
from clearml import Task, Logger

if name == "main":
task = Task.init(project_name="ClearML-Debug", task_name="[Mac] TB Logger, offline")
tb_logger = SummaryWriter(log_dir="tb_logger/demo/")
image_tensor = torch.rand(256, 256, 3)
for iter in range(10):
tb_logger.add_image(f"images/image123/img", image_tensor, iter, dataformats="HWC")

task.flush(wait_for_uploads=True) `Again the errors show up as

2022-11-09 09:47:27,602 - clearml.metrics - WARNING - Failed uploading to /Users/manuel/.clearml/cache/offline/offline-028b2df9167049eba4bdce7c6f89f39e/data (Target path "/Users/manuel/.clearml/cache/offline/offline-028b2df9167049eba4bdce7c6f89f39e/data" does not existAny idea about that? I’m also happy to open an issue on GitHub with the details if you like 🙂
By the way no rush about this - we will turn off TB logging in the meantime.
Thanks for all the help!

  
  
Posted 2 years ago

Let me try to build a minimal reproducible version

Thank you!

  
  
Posted 2 years ago

Happy to and thanks!

  
  
Posted 2 years ago

So AgitatedDove14 if we use the CLEARML_OFFLINE_MODE environment variable instead the program runs through again.
The only thing is that now we get errors of the form
0%| | 0/18 [00:00<?, ?image/s]ClearML running in offline mode, session stored in /home/manuel/.clearml/cache/offline/offline-167ceb1cd3c946df8abc7206b781b486 2022-11-07 07:49:06,986 - clearml.metrics - WARNING - Failed uploading to /home/manuel/.clearml/cache/offline/offline-167ceb1cd3c946df8abc7206b781b486/data (Target path "/home/manuel/.clearml/cache/offline/offline-167ceb1cd3c946df8abc7206b781b486/data" does not exist)I’ve checked the path and it does exist but for the data subdirectory, i.e. /home/manuel/.clearml/cache/offline/offline-167ceb1cd3c946df8abc7206b781b486/ exists but in there is no data directory. Any idea where that could come from? Could we turn off the local logging as well - in these kinds of runs we don’t need it?

  
  
Posted 2 years ago

Any idea where that could come from? Could we turn off the local logging as well - in these kinds of runs we don’t need it?

It is supposed to create it automatically... I tested with other examples (clearml version 1.7.3rc1) everything seems to work
What am I missing? how do we recreate the issue ? can you verify it is still not working with the latest RC?

  
  
Posted 2 years ago

By the way, if we don’t wrap other calls in is_offline() we get errors like “DateTime object is not serializable”, but that’s a secondary issue.

  
  
Posted 2 years ago
1K Views
12 Answers
2 years ago
one year ago
Tags