anyway, my ultimate goal is to create templates for other tasks... Is that possible in any other way through the CLI?
CostlyOstrich36 so why 1000:1000? My user and group are not that and so do all the otehr files I have under /opt/clearml
why not use my user and group?
I doubled checked the credentials in the configurations, and they have full EC2 access
` # define pipeline
pipe = clearml.PipelineController(
name=TASK_NAME,
project=PROJECT_NAME,
version='0.0.1',
add_pipeline_tags=False,
)
pipe.set_default_execution_queue('default')
Adding steps
pipe.add_step(name=f'{start_date_train}_{end_date_train}_choose_best',
base_task_project=CHOOSE_PROJECT_NAME,
base_task_name=CHOOSE_TASK_NAME,
parameter_override=params_override,
...
I think you are talking about separate problems - the "WARNING DIFF IS TOO LARGE" is only a UI issue, that you can't see hte diff in the UI - correct me if I'm wrong with this
Maria seems to be saying that the execution FAILS when she has uncomitted changes, which is not the expected behavior - am I right maria?
(I'm working with maria)
essentially, what maria says is when she has a script with uncomitted changes, when executing remotely, the script that actually runs on the remote machine is without the uncomitted changes
e.g.:
Her git status is clean, she makes some changes to script.py and executes it remotely. What gets executed remotely is the original script.py and not the modified version she has locally
So regarding 1, I'm not really sure what is the difference
When running in docker mode what is different the the regular mode? No where in the instructions is nvidia docker a prerequisite, so how exacly will tasks on GPU get executed?
I feel I don't underatand enough of the mechanism to (1) understand the difference between docker mode and not and (2) what is the use casr for each
I set it to true and restarted by agent
What do you mean by submodules?
She did not push, I told her she does not have to push before executing as trains figures out the diffs.
When she pushes - it works
and then how would I register the final artifact to the pipelien? AgitatedDove14 ⬆
pgrep -af trains shows that there is nothing running with that name
btw my site packages is false - should it be true? You pasted that but I'm not sure what it should be, in the paste is false but you are asking about true
it seems that only the packages that are on the script are getting installed
I am noticing that the files are saved locally, is there any chance that the files are over-written during the run or get deleted at some point and then replaced?
Yes they are local - I don't think there is a possibility they are getting overwritten... But that depends on how clearml names them. I showed you the code that saves the artifacts, but this code runs multiple times from a given template with different values - essentially it creates like 10 times the same task with different param...
Now I see the watermarks are 2gb
SuccessfulKoala55 this actually doesn't work
` apiserver_conf = ConfigFactory.parse_file(API_SERVER_CONF_PATH)
POINT 1
conf_content = HOCONConverter.to_hocon(config=ConfigFactory.from_dict(apiserver_conf.as_plain_ordered_dict()),
compact=False,
level=0, indent=2)
apiserver_conf['auth']['fixed_users']['users'].append(
ConfigFactory.from_dict({'username': username, 'password': password, 'name': name}))
##...
so putting the docs aside, what permissions should I give to the IAM associated with trains' autoscale ?
Maybe even a dedicated argument specifically for apt-get packages, since it is very common to need stuff like that
` # Python 3.8.10 (default, Jun 2 2021, 10:49:15) [GCC 9.4.0]
clearml == 1.0.5
hyperopt == 0.2.5
matplotlib == 3.4.3
numpy == 1.21.2
pandas == 1.3.2
plotly == 5.3.0
python_dateutil == 2.8.2
scikit_learn == 0.24.2
statsmodels == 0.12.2
tqdm == 4.62.2
Detailed import analysis
**************************
IMPORT PACKAGE clearml
tasks/data_projection.py: 9
tasks/hp_optimization.py: 6
tasks/hpo_n_best_evaluation.py: 6
tasks/pipelines/monthly_predictions.py: 4
IMPORT PACKAGE hypero...
inference table is a pandas dataframe
SuccessfulKoala55 seems like you got it spot on, it contains the entire repo, but no .git directory
So what can we do about it? All I want is to create templates for some tasks, so I can later execute them through a Pipelinecontroller
a machine that had previous installation, but I deleted the /opt/trains directory beforehand
I'm trying it now
Gotcha, didn't think of an external server as Service Containers are part of Github's offering, I'll consider that
