Reputation
Badges 1
25 × Eureka!@<1651395720067944448:profile|GiddyHedgehong81> just to be clear, Dataset.get_local_copy returns a path to your files,
You have to Manually add the additional path to the specific files you need to use. It does Not know that in advance.
That was the initial issue you had, and I assume it is the same one here. does that make sense ?
but then the error occurs, after the training und the validating where succesfuly completed
It seems it is failing on the last eval ? could it be testing is missing? is it the same dataset ? can you verify the file is there? (notice I see a mix of / and \ in the file name, this is odd Windows is \ and linux/mac are / , you should never have a mix)
WackyRabbit7 I guess we are discussing this one on a diff thread π but yes, should totally work, that's the idea
(without having to execute it first on Machine C)
Someone some where has to create the definition of the environment...
The easiest to go about it is to execute it one.
You can add to your code the following linetask.execute_remotely(queue_name='default')
This will cause you code to stop running and enqueue itself on a specific queue.
Quite useful if you want to make sure everything works, (like run a single step) then continue on another machine.
Notice that switching between cpu...
Is it possibe to launch a task from Machine C to the queue that Machine B's agent is listening to?
Yes, that's the idea
Do I have to have anything installed (aside from theΒ
trains
Β PIP package) on Machine C to do so?
Nothing, pure magic π
Closing the data doesnt work: dataset.close() AttributeError: 'Dataset' object has no attribute 'close'
Hi @<1523714677488488448:profile|NastyOtter17> could you send he full exception ?
A few implementation / design details:
When you run code with Trains (and call init) it will record your environment (python packages, git code, uncommitted changes etc) Everything is stored on the Task object in the trains-server, when you clone a task you literally create a copy of the Task object (i.e. a second experiment). on the cloned experiment, you can edit everything (parameters, git, base docker image etc) When you enqueue a Task you add its ID to the execution queue list a trains-a...
Sorry my bad, you are looking for:
None
Can you test with the latest RC:pip install clearml==1.0.3rc0
There is not dataset.close () π
FranticCormorant35
See here https://github.com/allegroai/trains/blob/master/examples/manual_reporting.py#L42
so if i plot image with matplot lib..it would not upload? i need use the logger.
Correct, if you have no "main" task , no automagic π
so how can i make it run with the "auto magic"
Automagic logs a single instance... unless those are subprocesses, in which case, the main task takes care of "copying" itself to the subprocess.
Again what is the use case for multiple machines?
It has to be alive so all the "child nodes" could report to it....
os.environ['TRAINS_PROC_MASTER_ID'] = '1:da0606f2e6fb40f692f5c885f807902a' os.environ['OMPI_COMM_WORLD_NODE_RANK'] = '1' task = Task.init(project_name="examples", task_name="Manual reporting") print(type(task))
Should be: <class 'trains.task.Task'>
I want to optimizer hyperparameters with trains.automation but: ...
Yes you are correct, in case of the example code, it should be "General/..." if you have ArgParser, it should be "Args/..." Yes it looks like the metric is wrong, it should be "epoch_accuracy" & "epoch_accuracy"
I'll make sure we fix the example, because as you pointed, it is broken :(
you are correct, I was referring to the template experiment
A few epochs is just fine
`
Example use case:
an_optimizer = HyperParameterOptimizer(
# This is the experiment we want to optimize
base_task_id=args['template_task_id'],
# here we define the hyper-parameters to optimize
hyper_parameters=[
UniformIntegerParameterRange('General/layer_1', min_value=128, max_value=512, step_size=128),
UniformIntegerParameterRange('General/layer_2', min_value=128, max_value=512, step_size=128),
DiscreteParameterRange('General/batch_size', values=[...
PungentLouse55 , make sure you fix the metric objective and args:
Add "General/" prefix to the list of arguments to optimize, and change the objective metric from "Accuracy" to "epoch_accuracy"
PungentLouse55 could you test again with the latest from the GitHub? I think the issue should be solved:pip install git+
Yes, sorry, that wasn't clear π
Thanks @<1523703472304689152:profile|UpsetTurkey67>
I'm pretty sure it has!
Let me check how we can merge it into the cleamrl-agent, sounds good?
Hi EnviousStarfish54
You mean the console output ? if that's the case, the Task.init call will monkey patch the sys.stdout/sys.stderr to report to clearml
as well as the console
The log is missing, but the Kedro logger is print toΒ sys.stdout in my local terminal.
I think the issue night be it starts a new subprocess, and that subprocess is not "patched" to capture the console output.
That said if an agent is running the entire pipeline, then everything is logged from the outside, so whatever is written to stdout/stderr is captured.
EnviousStarfish54 following on this issue, the root cause is that dictConfig will clean All handlers if Not passed "incremental": True
conf_logging = { "incremental": True, ... }
Since you pointed that Kedro is internally calling logging.config.dictConfig(conf_logging)
,
this seems like an issue with Kedro as this call will remove All logging handlers, which seems problematic. wdyt ?
EnviousStarfish54 Yes i'm not sure what happens there we will have to dive deeper, but now that you got us a code snippet to reproduce the issue it should not be very complicated to fix (I hope π€ )